Methods and compositions for polypeptide engineering

ABSTRACT

Methods are provided for the evolution of proteins of industrial and pharmaceutical interest, including methods for effecting recombination and selection. Compositions produced by these methods are also disclosed.

[0001] This application is a continuation-in-part of U.S. patentapplication Ser. No. 08/198,431, filed Feb. 17, 1994, Ser. No.PCT/US95/02126, filed, Feb. 17, 1995, Ser. No. 08/425,684, filed Apr.18, 1995, Ser. No. 08/537,874, filed Oct. 30, 1995, Ser. No. 08/564,955,filed Nov. 30, 1995, Ser. No. 08/621,859, filed Mar. 25, 1996, Ser. No.08/621,430, filed Mar. 25, 1996, Ser. No. PCT/US96/05480, filed Apr. 18,1996, Ser. No. 08/650,400, filed May 20, 1996, Ser. No. 08/675,502,filed Jul. 3, 1996, Ser. No. 08/721,824, filed Sep. 27, 1996, and Ser.No. 08/722,660 filed Sep. 27, 1996 the specifications of which areherein incorporated by reference in their entirety for all purposes.

BACKGROUND OF THE INVENTION

[0002] Recursive sequence recombination entails performing iterativecycles of recombination and screening or selection to “evolve”individual genes, whole plasmids or viruses, multigene clusters, or evenwhole genomes (Stemmer, Bio/Technology 13:549-553 (1995)). Suchtechniques do not require the extensive analysis and computationrequired by conventional methods for polypeptide engineering. Recursivesequence recombination allows the recombination of large numbers ofmutations in a minimum number of selection cycles, in contrast totraditional, pairwise recombination events.

[0003] Thus, recursive sequence recombination (RSR) techniques provideparticular advantages in that they provide recombination betweenmutations in any or all of these, thereby providing a very fast way ofexploring the manner in which different combinations of mutations canaffect a desired result.

[0004] In some instances, however, structural and/or functionalinformation is available which, although not required for recursivesequence recombination, provides opportunities for modification of thetechnique. In other instances, selection and/or screening of a largenumber of recombinants can be costly or time-consuming. A furtherproblem can be the manipulation of large nucleic acid molecules. Theinstant invention addresses these issues and others.

SUMMARY OF THE INVENTION

[0005] One aspect of the invention is a method for evolving a proteinencoded by a DNA substrate molecule comprising:

[0006] (a) digesting at least a first and second DNA substrate molecule,wherein the at least a first and second substrate molecules differ fromeach other in at least one nucleotide, with a restriction endonuclease;

[0007] (b) ligating the mixture to generate a library of recombinant DNAmolecules;

[0008] (c) screening or selecting the products of (b) for a desiredproperty; and

[0009] (d) recovering a recombinant DNA substrate molecule encoding anevolved protein.

[0010] A further aspect of the invention is a method for evolving aprotein encoded by a DNA substrate molecule by recombining at least afirst and second DNA substrate molecule, wherein the at least a firstand second substrate molecules differ from each other in at least onenucleotide and comprise defined segments, the method comprising:

[0011] (a) providing a set of oligonucleotide PCR primers, comprising atleast one primer for each segment, wherein the primer sequence iscomplementary to at least one junction with another segment;

[0012] (b) amplifying the segments of the at least a first and secondDNA substrate molecules with the primers of step (a) in a polymerasechain reaction;

[0013] (c) assembling the products of step (b) to generate a library ofrecombinant DNA substrate molecules;

[0014] (d) screening or selecting the products of (c) for a desiredproperty; and

[0015] (e) recovering a recombinant DNA substrate molecule from (d)encoding an evolved protein.

[0016] A further aspect of the invention is a method of enriching apopulation of DNA fragments for mutant sequences comprising:

[0017] (a) denaturing and renaturing the population of fragments togenerate a population of hybrid double-stranded fragments in which atleast one double-stranded fragment comprises at least one base pairmismatch;

[0018] (b) fragmenting the products of (a) into fragments of about20-100 bp;

[0019] (c) affinity-purifying fragments having a mismatch on an affinitymatrix to generate a pool of DNA fragments enriched for mutantsequences; and

[0020] (d) assembling the products of (c) to generate a library ofrecombinant DNA substrate molecules.

[0021] A further aspect of the invention is a method for evolving aprotein encoded by a DNA substrate molecule, by recombining at least afirst and second DNA substrate molecule, wherein the at least a firstand second substrate molecules share a region of sequence homology ofabout 10 to 100 base pairs and comprise defined segments, the methodcomprising:

[0022] (a) providing regions of homology in the at least a first andsecond DNA substrate molecules by inserting an intron sequence betweenat least two defined segments;

[0023] (b) fragmenting and recombining DNA substrate molecules of (a),wherein regions of homology are provided by the introns;

[0024] (c) screening or selecting the products of (b) for a desiredproperty; and

[0025] (d) recovering a recombinant DNA substrate molecule from theproducts of (c) encoding an evolved protein.

[0026] A further aspect of the invention is a method for evolving aprotein encoded by a DNA substrate molecule by recombining at least afirst and second DNA substrate molecule, wherein the at least a firstand second substrate molecules differ from each other in at least onenucleotide and comprise defined segments, the method comprising:

[0027] (a) providing a set of oligonucleotide PCR primers, wherein foreach strand of each segment a pair of primers is provided, one member ofeach pair bridging the junction at one end of the segment and the otherbridging the junction at the other end of the segment, with the terminalends of the DNA molecule having as one member of the pair a genericprimer, and wherein a set of primers is provided for each of the atleast a first and second substrate molecules;

[0028] (b) amplifying the segments of the at least a first and secondDNA substrate molecules with the primers of (a) in a polymerase chainreaction;

[0029] (c) assembling the products of (b) to generate a pool ofrecombinant DNA molecules;

[0030] (d) selecting or screening the products of (c) for a desiredproperty; and

[0031] (e) recovering a recombinant DNA substrate molecule from theproducts of (d) encoding an evolved protein.

[0032] A further aspect of the invention is a method for optimizingexpression of a protein by evolving the protein, wherein the protein isencoded by a DNA substrate molecule, comprising:

[0033] (a) providing a set of oligonucleotides, wherein eacholigonucleotide comprises at least two regions complementary to the DNAmolecule and at least one degenerate region, each degenerate regionencoding a region of an amino acid sequence of the protein;

[0034] (b) assembling the set of oligonucleotides into a library of fulllength genes;

[0035] (c) expressing the products of (b) in a host cell;

[0036] (d) screening the products of (c) for improved expression of theprotein; and

[0037] (e) recovering a recombinant DNA substrate molecule encoding anevolved protein from (d).

[0038] A further aspect of the invention is a method for optimizingexpression of a protein encoded by a DNA substrate molecule by evolvingthe protein, wherein the DNA substrate molecule comprises at least onelac operator and a fusion of a DNA sequence encoding the protein with aDNA sequence encoding a lac headpiece dimer, the method comprising:

[0039] (a) transforming a host cell with a library of mutagenized DNAsubstrate molecules;

[0040] (b) inducing expression of the protein encoded by the library of(a);

[0041] (c) preparing an extract of the product of (b);

[0042] (d) fractionating insoluble protein from complexes of solubleprotein and DNA; and

[0043] (e) recovering a DNA substrate molecule encoding an evolvedprotein from (d).

[0044] A further aspect of the invention is a method for evolvingfunctional expression of a protein encoded by a DNA substrate moleculecomprising a fusion of a DNA sequence encoding the protein with a DNAsequence encoding filamentous phage protein to generate a fusionprotein, the method comprising:

[0045] (a) providing a host cell producing infectious particlesexpressing a fusion protein encoded by a library of mutagenized DNAsubstrate molecules;

[0046] (b) recovering from (a) infectious particles displaying thefusion protein;

[0047] (c) affinity purifying particles displaying the mutant proteinusing a ligand for the protein; and

[0048] (d) recovering a DNA substrate molecule encoding an evolvedprotein from affinity purified particles of (c).

[0049] A further aspect of the invention is a method for optimizingexpression of a protein encoded by a DNA substrate molecule comprising afusion of a DNA sequence encoding the protein with a lac headpiecedimer, wherein the DNA substrate molecule is present on a first plasmidvector, the method comprising:

[0050] (a) providing a host cell transformed with the first vector and asecond vector comprising a library of mutants of at least one chaperoningene, and at least one lac operator;

[0051] (b) preparing an extract of the product of (a);

[0052] (c) fractionating insoluble protein from complexes of solubleprotein and DNA; and

[0053] (d) recovering DNA encoding a chaperonin gene from (c).

[0054] A further aspect of the invention is a method for optimizingexpression of a protein encoded by a DNA substrate molecule comprising afusion of a DNA sequence encoding the protein with a filamentous phagegene, wherein the fusion is carried on a phagemid comprising a libraryof chaperonin gene mutants, the method comprising:

[0055] (a) providing a host cell producing infectious particlesexpressing a fusion protein encoded by a library of mutagenized DNAsubstrate molecules;

[0056] (b) recovering from (a) infectious particles displaying thefusion protein;

[0057] (c) affinity purifying particles displaying the protein using aligand for the protein; and

[0058] (d) recovering DNA encoding the mutant chaperonin from affinitypurified particles of (c).

[0059] A further aspect of the invention is a method for optimizingsecretion of a protein in a host by evolving a gene encoding a secretoryfunction, comprising:

[0060] (a) providing a cluster of genes encoding secretory functions;

[0061] (b) recombining at least a first and second sequence in the genecluster of (a) encoding a secretory function, the at least a first andsecond sequences differing from each other in at least one nucleotide,to generate a library of recombinant sequences;

[0062] (c) transforming a host cell culture with the products of (b),wherein the host cell comprises a DNA sequence encoding the protein;

[0063] (d) subjecting the product of (c) to screening or selection forsecretion of the protein; and

[0064] (e) recovering DNA encoding an evolved gene encoding a secretoryfunction from the product of (d).

[0065] A further aspect of the invention is a method for evolving animproved DNA polymerase comprising:

[0066] (a) providing a library of mutant DNA substrate moleculesencoding mutant DNA polymerase;

[0067] (b) screening extracts of cells transfected with (a) andcomparing activity with wild type DNA polymerase;

[0068] (c) recovering mutant DNA substrate molecules from cells in (b)expressing mutant DNA polymerase having improved activity over wild-typeDNA polymerase; and

[0069] (d) recovering a DNA substrate molecule encoding an evolvedpolymerase from the products of (c).

[0070] A further aspect of the invention is a method for evolving a DNApolymerase with an error rate greater than that of wild type DNApolymerase comprising:

[0071] (a) providing a library of mutant DNA substrate moleculesencoding mutant DNA polymerase in a host cell comprising an indicatorgene having a revertible mutation, wherein the indicator gene isreplicated by the mutant DNA polymerase;

[0072] (b) screening the products of (a) for revertants of the indicatorgene;

[0073] (c) recovering mutant DNA substrate molecules from revertants;and

[0074] (d) recovering a DNA substrate molecule encoding an evolvedpolymerase from the products of (c).

[0075] A further aspect of the invention is a method for evolving a DNApolymerase, comprising:

[0076] (a) providing a library of mutant DNA substrate moleculesencoding mutant DNA polymerase, the library comprising a plasmid vector;

[0077] (b) preparing plasmid preparations and extracts of host cellstransfected with the products of (a);

[0078] (c) amplifying each plasmid preparation in a PCR reaction usingthe mutant polymerase encoded by that plasmid, the polymerase beingpresent in the host cell extract;

[0079] (d) recovering the PCR products of (c); and

[0080] (e) recovering a DNA substrate molecule encoding an evolvedpolymerase from the products of (d).

[0081] A further aspect of the invention is a method for evolving ap-nitrophenol phosphonatase from a phosphonatase encoded by a DNAsubstrate molecule, comprising:

[0082] (a) providing library of mutants of the DNA substrate molecule,the library comprising a plasmid expression vector;

[0083] (b) transfecting a host, wherein the host phn operon is deleted;

[0084] (c) selecting for growth of the transfectants of (b) using ap-nitrophenol phosphonatase as a substrate;

[0085] (d) recovering the DNA substrate molecules from transfectantsselected from (c); and

[0086] (e) recovering a DNA substrate molecule from (d) encoding anevolved phosphonatase.

[0087] A further aspect of the invention is a method for evolving aprotease encoded by a DNA substrate molecule comprising:

[0088] (a) providing library of mutants of the DNA substrate molecule,the library comprising a plasmid expression vector, wherein the DNAsubstrate molecule is linked to a secretory leader;

[0089] (b) transfecting a host;

[0090] (c) selecting for growth of the transfectants of (b) on a complexprotein medium; and

[0091] (d) recovering a DNA substrate molecule from (c) encoding anevolved protease.

[0092] A further aspect of the invention is a method for screening alibrary of protease mutants displayed on a phage to obtain an improvedprotease, wherein a DNA substrate molecule encoding the protease isfused to DNA encoding a filamentous phage protein to generate a fusionprotein, comprising:

[0093] (a) providing host cells expressing the fusion protein;

[0094] (b) overlaying host cells with a protein net to entrap the phage;

[0095] (c) washing the product of (b) to recover phage liberated bydigestion of the protein net;

[0096] (d) recovering DNA from the product of (c); and

[0097] (e) recovering a DNA substrate from (d) encoding an improvedprotease.

[0098] A further aspect of the invention is a method for screening alibrary of protease mutants to obtain an improved protease, the methodcomprising:

[0099] (a) providing a library of peptide substrates, the peptidesubstrate comprising a fluorophore and a fluorescence quencher;

[0100] (b) screening the library of protease mutants for ability tocleave the peptide substrates, wherein fluorescence is measured; and

[0101] (c) recovering DNA encoding at least one protease mutant from(b).

[0102] A further aspect of the invention is a method for evolving analpha interferon gene comprising:

[0103] (a) providing a library of mutant alpha interferon genes, thelibrary comprising a filamentous phage vector;

[0104] (b) stimulating cells comprising a reporter construct, thereporter construct comprising a reporter gene under control of aninterferon responsive promoter, and wherein the reporter gene is GFP;

[0105] (c) separating the cells expressing GFP by FACS;

[0106] (d) recovering phage from the product of (c); and

[0107] (e) recovering an evolved interferon gene from the product of(d).

[0108] A further aspect of the invention is a method for screening alibrary of mutants of a DNA substrate encoding a protein for an evolvedDNA substrate, comprising:

[0109] (a) providing a library of mutants, the library comprising anexpression vector;

[0110] (b) transfecting a mammalian host cell with the library of (a)wherein mutant protein is expressed on the surface of the cell;

[0111] (c) screening or selecting the products of (b) with a ligand forthe protein;

[0112] (d) recovering DNA encoding mutant protein from the products of(c); and

[0113] (e) recovering an evolved DNA substrate from the products of (d).

[0114] A further aspect of the invention is a method for evolving a DNAsubstrate molecule encoding an interferon alpha, comprising:

[0115] (a) providing a library of mutant alpha interferon genes, thelibrary comprising an expression vector wherein the alpha interferongenes are expressed under the control of an inducible promoter;

[0116] (b) transfecting host cells with the library of (a);

[0117] (c) contacting the product of (b) with a virus;

[0118] (d) recovering DNA encoding a mutant alpha interferon from hostcells surviving step (c); and

[0119] (e) recovering an evolved interferon gene from the product of(d).

[0120] A further aspect of the invention is a method for evolving thestability of a protein encoded by a DNA substrate molecule, the DNAsubstrate molecule comprising a fusion of a DNA sequence encoding theprotein with a DNA sequence encoding a filamentous phage protein togenerate a fusion protein, the method comprising:

[0121] (a) providing a host cell expressing a library of mutants of thefusion protein;

[0122] (b) affinity purifying the mutants with a ligand for the protein,wherein the ligand is a human serum protein, tissue specific protein, orreceptor;

[0123] (c) recovering DNA encoding a mutant protein from the affinityselected mutants of (b); and

[0124] (d) recovering an evolved gene encoding the protein from theproduct of (c).

[0125] A further aspect of the invention is a method for evolving aprotein having at least two subunits, comprising:

[0126] (a) providing a library of mutant DNA substrate molecules foreach subunit;

[0127] (b) recombining the libraries into a library of single chainconstructs of the protein, the single chain construct comprising a DNAsubstrate molecule encoding each subunit sequence, the subunit sequencebeing linked by a linker at a nucleic acid sequence encoding the aminoterminus of one subunit to a nucleic acid sequence encoding the carboxyterminus of a second subunit;

[0128] (c) screening or selecting the products of (B),

[0129] (d) recovering recombinant single chain construct DNA substratemolecules from the products of (c);

[0130] (e) subjecting the products of (d) to mutagenesis; and

[0131] (f) recovering an evolved single chain construct DNA substratemolecule from (e).

[0132] A further aspect of the invention is a method for evolving thecoupling of a mammalian 7-transmembrane receptor to a yeast signaltransduction pathway, comprising:

[0133] (a) expressing a library of mammalian G alpha protein mutants ina host cell, wherein the host cell expresses the mammalian7-transmembrane receptor and a reporter gene, the receptor gene geingexpressed under control of a pheromone responsive promoter;

[0134] (b) screening or selecting the products of (a) for expression ofthe reporter gene in the presence of a ligand for the 7-transmembrancereceptor; and

[0135] (c) recovering DNA encoding an evolved G alpha protein mutantfrom screened or selected products of (b).

[0136] A further aspect of the invention is a method for recombining atleast a first and second DNA substrate molecule, comprising:

[0137] (a) transfecting a host cell with at least a first and second DNAsubstrate molecule wherein the at least a first and second DNA substratemolecules are recombined in the host cell;

[0138] (b) screening or selecting the products of (a) for a desiredproperty; and

[0139] (c) recovering recombinant DNA substrate molecules from (b).

[0140] A further aspect of the invention is a method for evolving a DNAsubstrate sequence encoding a protein of interest, wherein the DNAsubstrate comprises a vector, the vector comprising single-stranded DNA,the method comprising:

[0141] (a) providing single-stranded vector DNA and a library of mutantsof the DNA substrate sequence;

[0142] (b) annealing single stranded DNA from the library of (a) to thesingle stranded vector DNA of (a);

[0143] (c) transforming the products of (b) into a host;

[0144] (d) screening the product of (c) for a desired property; and

[0145] (e) recovering evolved DNA substrate DNA from the products of(d).

BRIEF DESCRIPTION OF THE DRAWINGS

[0146]FIG. 1 depicts the alignment of oligo PCR primers for evolution ofbovine calf intestinal alkaline phosphatase.

[0147]FIG. 2 depicts the alignment of alpha interferon amino acid andnucleic acid sequences.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

[0148] The invention provides a number of strategies for evolvingpolypeptides through recursive recombination methods. In someembodiments, the strategies of the invention can generally be classifiedas “coarse grain shuffling” and “fine grain shuffling.” As described indetail below, these strategies are especially applicable in situationswhere some structural or functional information is available regardingthe polypeptides of interest, where the nucleic acid to be manipulatedis large, when selection or screening of many recombinants iscumbersome, and so on. “Coarse grain shuffling” generally involves theexchange or recombination of segments of nucleic acids, whether definedas functional domains, exons, restriction endonuclease fragments, orotherwise arbitrarily defined segments. “Fine grain shuffling” generallyinvolves the introduction of sequence variation within a segment, suchas within codons.

[0149] Coarse grain and fine grain shuffling allow analysis of variationoccuring within a nucleic acid sequence, also termed “searching ofsequence space.” Although both techniques are meritorious, the resultsare qualitatively different. For example, coarse grain searches areoften better suited for optimizing multigene clusters such as polyketideoperons, whereas fine grain searches are often optimal for optimizing aproperty such as protein expression using codon usage libraries.

[0150] The strategies generally entail evolution of gene(s) orsegment(s) thereof to allow retention of function in a heterologous cellor improvement of function in a homologous or heterologous cell.Evolution is effected generally by a process termed recursive sequencerecombination. Recursive sequence recombination can be achieved in manydifferent formats and permutations of formats, as described in furtherdetail below. These formats share some common principles. Recursivesequence recombination entails successive cycles of recombination togenerate molecular diversity, i.e., the creation of a family of nucleicacid molecules showing substantial sequence identity to each other butdiffering in the presence of mutations. Each recombination cycle isfollowed by at least one cycle of screening or selection for moleculeshaving a desired characteristic. The molecule(s) selected in one roundform the starting materials for generating diversity in the next round.In any given cycle, recombination can occur in vivo or in vitro.Furthermore, diversity resulting from recombination can be augmented inany cycle by applying prior methods of mutagenesis (e.g., error-pronePCR or cassette mutagenesis, passage through bacterial mutator strains,treatment with chemical mutagens) to either the substrates for orproducts of recombination.

[0151] I. Formats for Recursive Sequence Recombination

[0152] Some formats and examples for recursive sequence recombination,sometimes referred to as DNA shuffling, evolution, or molecularbreeding, have been described by the present inventors and co-workers inco-pending applications U.S. patent application Ser. No. 08/198,431,filed Feb. 17, 1994, Ser. No. PCT/US95/02126, filed, Feb. 17, 1995, Ser.No. 08/425,684, filed Apr. 18, 1995, Ser. No. 08/537,874, filed Oct. 30,1995, Ser. No. 08/564,955, filed Nov. 30, 1995, Ser. No. 08/621,859,filed Mar. 25, 1996, Ser. No. 08/621,430, filed Mar. 25, 1996, Ser. No.PCT/US96/05480, filed Apr. 18, 1996, Ser. No. 08/650,400, filed May 20,1996, Ser. No. 08/675,502, filed Jul. 3, 1996, Ser. No. 08/721,824,filed Sep. 27, 1996, and Ser. No. 08/722,660 filed Sep. 27, 1996;Stemmer, Science 270:1510 (1995); Stemmer et al., Gene 164:49-53 (1995);Stemmer, Bio/Technology 13:549-553 (1995); Stemmer, Proc. Natl. Acad.Sci. U.S.A. 91:10747-10751 (1994); Stemmer, Nature 370:389-391 (1994);Crameri et al., Nature Medicine 2(1):1-3 (1996); Crameri et al., NatureBiotechnology 14:315-319 (1996), each of which is incorporated byreference in its entirety for all purposes.

[0153] In general, the term “gene” is used herein broadly to refer toany segment or sequence of DNA associated with a biological function.Genes can be obtained from a variety of sources, including cloning froma source of interest or synthesizing from known or predicted sequenceinformation, and may include sequences designed to have desiredparameters.

[0154] A wide variety of cell types can be used as a recipient ofevolved genes. Cells of particular interest include many bacterial celltypes, both gram-negative and gram-positive, such as Rhodococcus,Streptomycetes, Actinomycetes, Corynebacteria, Penicillium, Bacillus,Escherichia coli, Pseudomonas, Salmonella, and Erwinia. Cells ofinterest also include eukaryotic cells, particularly mammalian cells(e.g., mouse, hamster, primate, human), both cell lines and primarycultures. Such cells include stem cells, including embryonic stem cells,zygotes, fibroblasts, lymphocytes, Chinese hamster ovary (CHO), mousefibroblasts (NIH3T3), kidney, liver, muscle, and skin cells. Othereukaryotic cells of interest include plant cells, such as maize, rice,wheat, cotton, soybean, sugarcane, tobacco, and arabidopsis; fish,algae, fungi (Penicillium, Fusarium, Aspergillus, Podospora,Neurospora), insects, yeasts (Picchia and Saccharomyces).

[0155] The choice of host will depend on a number of factors, dependingon the intended use of the engineered host, including pathogenicity,substrate range, environmental hardiness, presence of key intermediates,ease of genetic manipulation, and likelihood of promiscuous transfer ofgenetic information to other organisms. A preferred host has the abilityto replicate vector DNA, express proteins of interest, and properlytraffic proteins of interest. Particularly advantageous hosts are E.coli, lactobacilli, Streptomycetes, Actinomycetes, fungi such asSaccaromyces cerivisiae or Pischia pastoris, Schneider cells, L-cells,COS cells, CHO cells, and transformed B cell lines such as SP2/0, J558,NS-1 and AG8-653.

[0156] The breeding procedure starts with at least two substrates thatgenerally show substantial sequence identity to each other (i.e., atleast about 50%, 70%, 80% or 90% sequence identity), but differ fromeach other at certain positions. The difference can be any type ofmutation, for example, substitutions, insertions and deletions. Often,different segments differ from each other in perhaps 5-20 positions. Forrecombination to generate increased diversity relative to the startingmaterials, the starting materials must differ from each other in atleast two nucleotide positions. That is, if there are only twosubstrates, there should be at least two divergent positions. If thereare three substrates, for example, one substrate can differ from thesecond as a single position, and the second can differ from the third ata different single position. The starting DNA segments can be naturalvariants of each other, for example, allelic or species variants. Thesegments can also be from nonallelic genes showing some degree ofstructural and usually functional relatedness (e.g., different geneswithin a superfamily such as the immunoglobulin superfamily). Thestarting DNA segments can also be induced variants of each other. Forexample, one DNA segment can be produced by error-prone PCR replicationof the other, or by substitution of a mutagenic cassette. Inducedmutants can also be prepared by propagating one (or both) of thesegments in a mutagenic strain. In these situations, strictly speaking,the second DNA segment is not a single segment but a large family ofrelated segments. The different segments forming the starting materialsare often the same length or substantially the same length. However,this need not be the case; for example; one segment can be a subsequenceof another. The segments can be present as part of larger molecules,such as vectors, or can be in isolated form.

[0157] The starting DNA segments are recombined by any of the recursivesequence recombination formats provded herein to generate a diverselibrary of recombinant DNA segments. Such a library can vary widely insize from having fewer than 10 to more than 10⁵, 10⁹, or 10¹² members.In general, the starting segments and the recombinant librariesgenerated include full-length coding sequences and any essentialregulatory sequences, such as a promoter and polyadenylation sequence,required for expression. However, if this is not the case, therecombinant DNA segments in the library can be inserted into a commonvector providing the missing sequences before performingscreening/selection.

[0158] If the recursive sequence recombination format employed is an invivo format, the library of recombinant DNA segments generated alreadyexists in a cell, which is usually the cell type in which expression ofthe enzyme with altered substrate specificity is desired. If recursivesequence recombination is performed in vitro, the recombinant library ispreferably introduced into the desired cell type beforescreening/selection. The members of the recombinant library can belinked to an episome or virus before introduction or can be introduceddirectly. In some embodiments of the invention, the library is amplifiedin a first host, and is then recovered from that host and introduced toa second host more amenable to expression, selection, or screening, orany other desirable parameter. The manner in which the library isintroduced into the cell type depends on the DNA-uptake characteristicsof the cell type, e.g., having viral receptors, being capable ofconjugation, or being naturally competent. If the cell type isinsusceptible to natural and chemical-induced competence, butsusceptible to electroporation, one would usually employelectroporation. If the cell type is insusceptible to electroporation aswell, one can employ biolistics. The biolistic PDS-1000 Gene Gun(Biorad, Hercules, Calif.) uses helium pressure to accelerate DNA-coatedgold or tungsten microcarriers toward target cells. The process isapplicable to a wide range of tissues, including plants, bacteria,fungi, algae, intact animal tissues, tissue culture cells, and animalembryos. One can employ electronic pulse delivery, which is essentiallya mild electroporation format for live tissues in animals and patients.Zhao, Advanced Drug Delivery Reviews 17:257-262 (1995). Novel methodsfor making cells competent are described in co-pending application U.S.patent application Ser. No. 08/621,430, filed Mar. 25, 1996. Afterintroduction of the library of recombinant DNA genes, the cells areoptionally propagated to allow expression of genes to occur.

[0159] A. In Vitro Formats

[0160] One format for recursive sequence recombination utilizes a poolof related sequences. The sequences can be DNA or RNA and can be ofvarious lengths depending on the size of the gene or DNA fragment to berecombined or reassembled. Preferably the sequences are from 50 bp to100 kb.

[0161] The pool of related substrates can be fragmented, usually atrandom, into fragments of from about 5 bp to 5 kb or more. Preferablythe size of the random fragments is from about 10 bp to 1000 bp, morepreferably the size of the DNA fragments is from about 20 bp to 500 bp.The substrates can be digested by a number of different methods, such asDNAseI or RNAse digestion, random shearing or restriction enzymedigestion. The concentration of nucleic acid fragments of a particularlength is often less than 0.1% or 1% by weight of the total nucleicacid. The number of different specific nucleic acid fragments in themixture is usually at least about 100, 500 or 1000.

[0162] The mixed population of nucleic acid fragments are denatured byheating to about 80° C. to 100° C., more preferably from 90° C. to 96°C., to form single-stranded nucleic acid fragments. Single-strandednucleic acid fragments having regions of sequence identity with othersingle-stranded nucleic acid fragments can then be reannealed by coolingto 20° C. to 75° C., and preferably from 40° C. to 65° C. Renaturationcan be accelerated by the addition of polyethylene glycol (“PEG”) orsalt. The salt concentration is preferably from 0 mM to 600 mM, morepreferably the salt concentration is from 10 mM to 100 mM. The salt maybe such salts as (NH₄)₂SO₄, KCl, or NaCl. The concentration of PEG ispreferably from 0% to 20%, more preferably from 5% to 10%. The fragmentsthat reanneal can be from different substrates.

[0163] The annealed nucleic acid fragments are incubated in the presenceof a nucleic acid polymerase, such as Taq or Klenow, Mg⁺⁺ at 1 mM-20 mM,and dNTP's (i.e. dATP, dCTP, dGTP and dTTP). If regions of sequenceidentity are large, Taq or other high-temperature polymerase can be usedwith an annealing temperature of between 45-65° C. If the areas ofidentity are small, Klenow or other low-temperature polymerases can beused with an annealing temperature of between 20-30° C. The polymerasecan be added to the random nucleic acid fragments prior to annealing,simultaneously with annealing or after annealing.

[0164] The cycle of denaturation, renaturation and incubation of randomnucleic acid fragments in the presence of polymerase is sometimesreferred to as “shuffling” of the nucleic acid in vitro. This cycle isrepeated for a desired number of times. Preferably the cycle is repeatedfrom 2 to 100 times, more preferably the sequence is repeated from 10 to40 times. The resulting nucleic acids are a family of double-strandedpolynucleotides of from about 50 bp to about 100 kb, preferably from 500bp to 50 kb. The population represents variants of the startingsubstrates showing substantial sequence identity thereto but alsodiverging at several positions. The population has many more membersthan the starting substrates. The population of fragments resulting fromrecombination is preferably first amplified by PCR, then cloned into anappropriate vector and the ligation mixture used to transform hostcells.

[0165] In a variation of in vitro shuffling, subsequences ofrecombination substrates can be generated by amplifying the full-lengthsequences under conditions which produce a substantial fraction,typically at least 20 percent or more, of incompletely extendedamplification products. The amplification products, including theincompletely extended amplification products are denatured and subjectedto at least one additional cycle of reannealing and amplification. Thisvariation, wherein at least one cycle of reannealing and amplificationprovides a substantial fraction of incompletely extended products, istermed “stuttering.” In the subsequent amplification round, theincompletely extended products anneal to and prime extension ondifferent sequence-related template species.

[0166] In a further variation, at least one cycle of amplification canbe conducted using a collection of overlapping single-stranded DNAfragments of related sequence, and different lengths. Each fragment canhybridize to and prime polynucleotide chain extension of a secondfragment from the collection, thus forming sequence-recombinedpolynucleotides. In a further variation, single-stranded DNA fragmentsof variable length can be generated from a single primer by Vent DNApolymerase on a first DNA template. The single stranded DNA fragmentsare used as primers for a second, Kunkel-type template, consisting of auracil-containing circular single-stranded DNA. This results in multiplesubstitutions of the first template into the second (see Levichkin etal., Mol. Biology 29:572-577 (1995)).

[0167] Nucleic acid sequences can be recombined by recursive sequencerecombination even if they lack sequence homology. Homology can beintroduced using synthetic oligonucleotides as PCR primers. In additionto the specific sequences for the nucleic acid segment being amplified,all of the primers used to amplify one particular segment aresynthesized to contain an additional sequence of 20-40 bases 5′ to thegene (sequence A) and a different 20-40 base sequence 3′ to the segment(sequence B). An adjacent segment is amplified using a 5′ primer whichcontains the complementary strand of sequence B (sequence B′), and a 3′primer containing a different 20-40 base sequence (C). Similarly,primers for the next adjacent segment contain sequences C′(complementary to C) and D. In this way, small regions of homology areintroduced, making the segments into site-specific recombinationcassettes. Subsequent to the initial amplification of individualsegments, the amplified segments can then be mixed and subjected toprimeness PCR.

[0168] When domains within a polypeptide are shuffled, it may not bepossible to introduce additional flanking sequences to the domains, dueto the constraint of maintaining a continuous open reading frame.Instead, groups of oligonucleotides are synthesized that are homologousto the 3′ end of the first domain encoded by one of the genes to beshuffled, and the 5′ ends of the second domains encoded by all of theother genes to be shuffled together. This is repeated with all domains,thus providing sequences that allow recombination between proteindomains while maintaining their order.

[0169] B. In Vivo Formats

[0170] 1. Plasmid-Plasmid Recombination

[0171] The initial substrates for recombination are a collection ofpolynucleotides comprising variant forms of a gene. The variant formsusually show substantial sequence identity to each other sufficient toallow homologous recombination between substrates. The diversity betweenthe polynucleotides can be natural (e.g., allelic or species variants),induced (e.g., error-prone PCR or error-prone recursive sequencerecombination), or the result of in vitro recombination. Diversity canalso result from resynthesizing genes encoding natural proteins withalternative codon usage. There should be at least sufficient diversitybetween substrates that recombination can generate more diverse productsthan there are starting materials. There must be at least two substratesdiffering in at least two positions. However, commonly a library ofsubstrates of 10³-10⁸ members is employed. The degree of diversitydepends on the length of the substrate being recombined and the extentof the functional change to be evolved. Diversity at between 0.1-25% ofpositions is typical. The diverse substrates are incorporated intoplasmids. The plasmids are often standard cloning vectors, e.g.,bacterial multicopy plasmids. However, in some methods to be describedbelow, the plasmids include mobilization (MOB) functions. The substratescan be incorporated into the same or different plasmids. Often at leasttwo different types of plasmid having different types of selectablemarkers are used to allow selection for cells containing at least twotypes of vector. Also, where different types of plasmid are employed,the different plasmids can come from two distinct incompatibility groupsto allow stable co-existence of two different plasmids within the cell.Nevertheless, plasmids from the same incompatibility group can stillco-exist within the same cell for sufficient time to allow homologousrecombination to occur.

[0172] Plasmids containing diverse substrates are initially introducedinto cells by any method (e.g., chemical transformation, naturalcompetence, electroporation, biolistics, packaging into phage or viralsystems). Often, the plasmids are present at or near saturatingconcentration (with respect to maximum transfection capacity) toincrease the probability of more than one plasmid entering the samecell. The plasmids containing the various substrates can be transfectedsimultaneously or in multiple rounds. For example, in the latterapproach cells can be transfected with a first aliquot of plasmid,transfectants selected and propagated, and then infected with a secondaliquot of plasmid.

[0173] Having introduced the plasmids into cells, recombination betweensubstrates to generate recombinant genes occurs within cells containingmultiple different plasmids merely by propagating the cells. However,cells that receive only one plasmid are unable to participate inrecombination and the potential contribution of substrates on suchplasmids to evolution is not fully exploited (although these plasmidsmay contribute to some extent if they are progagated in mutator cells).The rate of evolution can be increased by allowing all substrates toparticipate in recombination. Such can be achieved by subjectingtransfected cells to electroporation. The conditions for electroporationare the same as those conventionally used for introducing exogenous DNAinto cells (e.g., 1,000-2,500 volts, 400 μF and a 1-2 mM gap). Underthese conditions, plasmids are exchanged between cells allowing allsubstrates to participate in recombination. In addition the products ofrecombination can undergo further rounds of recombination with eachother or with the original substrate. The rate of evolution can also beincreased by use of conjugative transfer. To exploit conjugativetransfer, substrates can be cloned into plasmids having MOB genes, andtra genes are also provided in cis or in trans to the MOB genes. Theeffect of conjugative transfer is very similar to electroporation inthat it allows plasmids to move between cells and allows recombinationbetween any substrate and the products of previous recombination tooccur, merely by propagating the culture. The rate of evolution can alsobe increased by fusing cells to induce exchange of plasmids orchromosomes. Fusion can be induced by chemical agents, such as PEG, orviral proteins, such as influenza virus hemagglutinin, HSV-1 gB and gD.The rate of evolution can also be increased by use of mutator host cells(e.g., Mut L, S, D, T, H in bacteria and Ataxia telangiectasia humancell lines).

[0174] The time for which cells are propagated and recombination isallowed to occur, of course, varies with the cell type but is generallynot critical, because even a small degree of recombination cansubstantially increase diversity relative to the starting materials.Cells bearing plasmids containing recombined genes are subject toscreening or selection for a desired function. For example, if thesubstrate being evolved contains a drug resistance gene, one wouldselect for drug resistance. Cells surviving screening or selection canbe subjected to one or more rounds of screening/selection followed byrecombination or can be subjected directly to an additional round ofrecombination. “Screening” as used herein is intended to include“selection” as a type of screen.

[0175] The next round of recombination can be achieved by severaldifferent formats independently of the previous round. For example, afurther round of recombination can be effected simply by resuming theelectroporation or conjugation-mediated intercellular transfer ofplasmids described above. Alternatively, a fresh substrate orsubstrates, the same or different from previous substrates, can betransfected into cells surviving selection/screening. Optionally, thenew substrates are included in plasmid vectors bearing a differentselective marker and/or from a different incompatibility group than theoriginal plasmids. As a further alternative, cells survivingselection/screening can be subdivided into two subpopulations, andplasmid DNA from one subpopulation transfected into the other, where thesubstrates from the plasmids from the two subpopulations undergo afurther round of recombination. In either of the latter two options, therate of evolution can be increased by employing DNA extraction,electroporation, conjugation or mutator cells, as described above. In astill further variation, DNA from cells surviving screening/selectioncan be extracted and subjected to in vitro recursive sequencerecombination.

[0176] After the second round of recombination, a second round ofscreening/selection is performed, preferably under conditions ofincreased stringency. If desired, further rounds of recombination andselection/screening can be performed using the same strategy as for thesecond round. With successive rounds of recombination andselection/screening, the surviving recombined substrates evolve towardacquisition of a desired phenotype. Typically, in this and other methodsof recursive recombination, the final product of recombination that hasacquired the desired phenotype differs from starting substrates at0.1%-25% of positions and has evolved at a rate orders of magnitude inexcess (e.g., by at least 10-fold, 100-fold, 1000-fold, or 10,000 fold)of the rate of evolution driven by naturally acquired mutation of about1 mutation per 10⁻⁹ positions per generation (see Anderson et al., Proc.Natl. Acad. Sci. U.S.A. 93:906-907 (1996)). The “final product” may betransferred to another host more desirable for utilization of the“shuffled” DNA. This is particularly advantageous in situations wherethe more desirable host is less efficient as a host for the many cyclesof mutation/ recombination due to the lack of molecular biology orgenetic tools available for other organisms such as E. coli.

[0177] 2. Virus-Plasmid Recombination

[0178] The strategy used for plasmid-plasmid recombination can also beused for virus-plasmid recombination; usually, phage-plasmidrecombination. However, some additional comments particular to the useof viruses are appropriate. The initial substrates for recombination arecloned into both plasmid and viral vectors. It is usually not criticalwhich substrate(s) is/are inserted into the viral vector and which intothe plasmid, although usually the viral vector should contain differentsubstrate(s) from the plasmid. As before, the plasmid (and the virus)typically contains a selective marker. The plasmid and viral vectors canboth be introduced into cells by transfection as described above.However, a more efficient procedure is to transfect the cells withplasmid, select transfectants and infect the transfectants with virus.Because the efficiency of infection of many viruses approaches 100% ofcells, most cells transfected and infected by this route contain both aplasmid and virus bearing different substrates.

[0179] Homologous recombination occurs between plasmid and virusgenerating both recombined plasmids and recombined virus. For someviruses, such as filamentous phage, in which intracellular DNA exists inboth double-stranded and single-stranded forms, both can participate inrecombination. Provided that the virus is not one that rapidly killscells, recombination can be augmented by use of electroporation orconjugation to transfer plasmids between cells. Recombination can alsobe augmented for some types of virus by allowing the progeny virus fromone cell to reinfect other cells. For some types of virus, virusinfected-cells show resistance to superinfection. However, suchresistance can be overcome by infecting at high multiplicity and/orusing mutant strains of the virus in which resistance to superinfectionis reduced.

[0180] The result of infecting plasmid-containing cells with virusdepends on the nature of the virus. Some viruses, such as filamentousphage, stably exist with a plasmid in the cell and also extrude progenyphage from the cell. Other viruses, such as lambda having a cosmidgenome, stably exist in a cell like plasmids without producing progenyvirions. Other viruses, such as the T-phage and lytic lambda, undergorecombination with the plasmid but ultimately kill the host cell anddestroy plasmid DNA. For viruses that infect cells without killing thehost, cells containing recombinant plasmids and virus can bescreened/selected using the same approach as for plasmid-plasmidrecombination. Progeny virus extruded by cells survivingselection/screening can also be collected and used as substrates insubsequent rounds of recombination. For viruses that kill their hostcells, recombinant genes resulting from recombination reside only in theprogeny virus. If the screening or selective assay requires expressionof recombinant genes in a cell, the recombinant genes should betransferred from the progeny virus to another vector, e.g., a plasmidvector, and retransfected into cells before selection/screening isperformed.

[0181] For filamentous phage, the products of recombination are presentin both cells surviving recombination and in phage extruded from thesecells. The dual source of recombinant products provides some additionaloptions relative to the plasmid-plasmid recombination. For example, DNAcan be isolated from phage particles for use in a round of in vitrorecombination. Alternatively, the progeny phage can be used to transfector infect cells surviving a previous round of screening/selection, orfresh cells transfected with fresh substrates for recombination.

[0182] 3. Virus-Virus Recombination

[0183] The principles described for plasmid-plasmid and plasmid-viralrecombination can be applied to virus-virus recombination with a fewmodifications. The initial substrates for recombination are cloned intoa viral vector. Usually, the same vector is used for all substrates.Preferably, the virus is one that, naturally or as a result of mutation,does not kill cells. After insertion, some viral genomes can be packagedin vitro or using a packaging cell line. The packaged viruses are usedto infect cells at high multiplicity such that there is a highprobability that a cell will receive multiple viruses bearing differentsubstrates.

[0184] After the initial round of infection, subsequent steps depend onthe nature of infection as discussed in the previous section. Forexample, if the viruses have phagemid (Sambrook et al., MolecularCloning, CSH Press, 1987) genomes such as lambda cosmids or M13, F1 orFd phagemids, the phagemids behave as plasmids within the cell andundergo recombination simply by propagating the cells. Recombination isparticularly efficient between single-stranded forms of intracellularDNA. Recombination can be augmented by electroporation of cells.

[0185] Following selection/screening, cosmids containing recombinantgenes can be recovered from surviving cells, e.g., by heat induction ofa cos⁻ lysogenic host cell, or extraction of DNA by standard procedures,followed by repackaging cosmid DNA in vitro.

[0186] If the viruses are filamentous phage, recombination ofreplicating form DNA occurs by propagating the culture of infectedcells. Selection/screening identifies colonies of cells containing viralvectors having recombinant genes with improved properties, together withinfectious particles (i.e., phage or packaged phagemids) extruded fromsuch cells. Subsequent options are essentially the same as forplasmid-viral recombination.

[0187] 4. Chromosome Recombination

[0188] This format can be used to especially evolve chromosomalsubstrates. The format is particularly preferred in situations in whichmany chromosomal genes contribute to a phenotype or one does not knowthe exact location of the chromosomal gene(s) to be evolved. The initialsubstrates for recombination are cloned into a plasmid vector. If thechromosomal gene(s) to be evolved are known, the substrates constitute afamily of sequences showing a high degree of sequence identity but somedivergence from the chromosomal gene. If the chromosomal genes to beevolved have not been located, the initial substrates usually constitutea library of DNA segments of which only a small number show sequenceidentity to the gene or gene(s) to be evolved. Divergence betweenplasmid-borne substrate and the chromosomal gene(s) can be induced bymutagenesis or by obtaining the plasmid-borne substrates from adifferent species than that of the cells bearing the chromosome.

[0189] The plasmids bearing substrates for recombination are transfectedinto cells having chromosomal gene(s) to be evolved. Evolution can occursimply by propagating the culture, and can be accelerated bytransferring plasmids between cells by conjugation or electroporation.Evolution can be further accelerated by use of mutator host cells or byseeding a culture of nonmutator host cells being evolved with mutatorhost cells and inducing intercellular transfer of plasmids byelectroporation or conjugation. Preferably, mutator host cells used forseeding contain a negative selectable marker to facilitate isolation ofa pure culture of the nonmutator cells being evolved.Selection/screening identifies cells bearing chromosomes and/or plasmidsthat have evolved toward acquisition of a desired function.

[0190] Subsequent rounds of recombination and selection/screeningproceed in similar fashion to those described for plasmid-plasmidrecombination. For example, further recombination can be effected bypropagating cells surviving recombination in combination withelectroporation or conjugative transfer of plasmids. Alternatively,plasmids bearing additional substrates for recombination can beintroduced into the surviving cells. Preferably, such plasmids are froma different incompatibility group and bear a different selective markerthan the original plasmids to allow selection for cells containing atleast two different plasmids. As a further alternative, plasmid and/orchromosomal DNA can be isolated from a subpopulation of surviving cellsand transfected into a second subpopulation. Chromosomal DNA can becloned into a plasmid vector before transfection.

[0191] 5. Virus-Chromosome Recombination

[0192] As in the other methods described above, the virus is usually onethat does not kill the cells, and is often a phage or phagemid. Theprocedure is substantially the same as for plasmid-chromosomerecombination. Substrates for recombination are cloned into the vector.Vectors including the substrates can then be transfected into cells orin vitro packaged and introduced into cells by infection. Viral genomesrecombine with host chromosomes merely by propagating a culture.Evolution can be accelerated by allowing intercellular transfer of viralgenomes by electroporation, or reinfection of cells by progeny virions.Screening/selection identifies cells having chromosomes and/or viralgenomes that have evolved toward acquisition of a desired function.

[0193] There are several options for subsequent rounds of recombination.For example, viral genomes can be transferred between cells survivingselection/recombination by electroporation. Alternatively, virusesextruded from cells surviving selection/screening can be pooled and usedto superinfect the cells at high multiplicity. Alternatively, freshsubstrates for recombination can be introduced into the cells, either onplasmid or viral vectors.

[0194] II. Application of Recursive Sequence Recombination to Evolutionof Polypeptides

[0195] In addition to the techniques described above, some additionallyadvantageous modifications of these techniques for the evolution ofpolypeptides are described below. These methods are referred to as “finegrain” and “coarse grain” shuffling. The coarse grain methods allow oneto exchange chunks of genetic material between substrate nucleic acids,thereby limiting diversity in the resulting recombinants to exchanges orsubstitutions of domains, restriction fragments, oligo-encoded blocks ofmutations, or other arbitrarily defined segments, rather thanintroducing diversity more randomly across the substrate. In contrast tocoarse grain shuffling, fine grain shuffling methods allow thegeneration of all possible recombinations, or permutations, of a givenset of very closely linked mutations, including multiple permutations,within a single segment, such as a codon.

[0196] In some embodiments, coarse grain or fine grain shufflingtechniques are not performed as exhaustive searches of all possiblemutations within a nucleic acid sequence. Rather, these techniques areutilized to provide a sampling of variation possible within a gene basedon known sequence or structural information. The size of the sample istypically determined by the nature of the screen or selection process.For example, when a screen is performed in a 96-well microtiter format,it may be preferable to limit the size of the recombinant library toabout 100 such microtiter plates for convenience in screening.

[0197] A. Use of Restriction Enzyme Sites to Recombine Mutations

[0198] In some situations it is advantageous to use restriction enzymesites in nucleic acids to direct the recombination of mutations in anucleic acid sequence of interest. These techniques are particularlypreferred in the evolution of fragments that cannot readily be shuffledby existing methods due to the presence of repeated DNA or otherproblematic primary sequence motifs. They are also preferred forshuffling large fragments (typically greater than 10 kb), such as geneclusters that cannot be readily shuffled and “PCR-amplified” because oftheir size. Although fragments up to 50 kb have been reported to beamplified by PCR (Barnes, Proc. Natl. Acad. Sci. (U.S.A.) 91:2216-2220(1994)), it can be problematic for fragments over 10 kb, and thusalternative methods for shuffling in the range of 10-50 kb and beyondare preferred. Preferably, the restriction endonucleases used are of theClass II type (Sambrook et al., Molecular Cloning, CSH Press, 1987) andof these, preferably those which generate nonpalindromic sticky endoverhangs such as Alwn I, Sfi I or BstX1. These enzymes generatenonpalindromic ends that allow for efficient ordered reassembly with DNAligase. Typically, restriction enzyme (or endonuclease) sites areidentified by conventional restriction enzyme mapping techniques(Sambrook et al., Molecular Cloning, CSH Press, 1987), by analysis ofsequence information for that gene, or by introduction of desiredrestriction sites into a nucleic acid sequence by synthesis (i.e. byincorporation of silent mutations).

[0199] The DNA substrate molecules to be digested can either be from invivo replicated DNA, such as a plasmid preparation, or from PCRamplified nucleic acid fragments harboring the restriction enzymerecognition sites of interest, preferably near the ends of the fragment.Typically, at least two variants of a gene of interest, each having oneor more mutations, are digested with at least one restriction enzymedetermined to cut within the nucleic acid sequence of interest. Therestriction fragments are then joined with DNA ligase to generate fulllength genes having shuffled regions. The number of regions shuffledwill depend on the number of cuts within the nucleic acid sequence ofinterest. The shuffled molecules can be introduced into cells asdescribed above and screened or selected for a desired property. Nucleicacid can then be isolated from pools (libraries) or clones havingdesired properties and subjected to the same procedure until a desireddegree of improvement is obtained.

[0200] In some embodiments, at least one DNA substrate molecule orfragment thereof is isolated and subjected to mutagenesis. In someembodiments, the pool or library of religated restriction fragments aresubjected to mutagenesis before the digestion-ligation process isrepeated. “Mutagenesis” as used herein comprises such techniques knownin the art as PCR mutagenesis, oligonucleotide-directed mutagenesis,site-directed mutagenesis, etc., and recursive sequence recombination byany of the techniques described herein.

[0201] An example of the use of this format is in the manipulation ofpolyketide clusters. Polyketide clusters (Khosla et al., TIBTECH 14,September 1996) are typically 10 to 100 kb in length, specifyingmultiple large polypeptides which assemble into very large multienzymecomplexes. Due to the modular nature of these complexes and the modularnature of the biosynthetic pathway, nucleic acids encoding proteinmodules can be exchanged between different polyketide clusters togenerate novel and functional chimeric polyketides. The introduction ofrare restriction endonuclease sites such as SfiI (eight baserecognition, nonpalindromic overhangs) at nonessential sites betweenpolypeptides or in introns engineered within polypeptides would provide“handles” with which to manipulate exchange of nucleic acid segmentsusing the technique described above.

[0202] B. Reassembly PCR

[0203] A further technique for recursively recombining mutations in anucleic acid sequence utilizes “reassembly PCR”. This method can be usedto assemble multiple segments that have been separately evolved into afull length nucleic acid template such as a gene. This technique isperformed when a pool of advantageous mutants is known from previouswork or has been identified by screening mutants that may have beencreated by any mutagenesis technique known in the art, such as PCRmutagenesis, cassette mutagenesis, doped oligo mutagenesis, chemicalmutagenesis, or propagation of the DNA template in vivo in mutatorstrains. Boundaries defining segments of a nucleic acid sequence ofinterest preferably lie in intergenic regions, introns, or areas of agene not likely to have mutations of interest. Preferably,oligonucleotide primers (oligos) are synthesized for PCR amplificationof segments of the nucleic acid sequence of interest, such that thesequences of the oligonucleotides overlap the junctions of two segments.The overlap region is typically about 10 to 100 nucleotides in length.Each of the segments is amplified with a set of such primers. The PCRproducts are then “reassembled” according to assembly protocols such asthose used in Sections IA-B above to assemble randomly fragmented genes.In brief, in an assembly protocol the PCR products are first purifiedaway from the primers, by, for example, gel electrophoresis or sizeexclusion chromatography. Purified products are mixed together andsubjected to about 1-10 cycles of denaturing, reannealing, and extensionin the presence of polymerase and deoxynucleoside triphosphates (dNTP's)and appropriate buffer salts in the absence of additional primers(“self-priming”). Subsequent PCR with primers flanking the gene are usedto amplify the yield of the fully reassembled and shuffled genes. Thismethod is necessarily “coarse grain” and hence only recombines mutationsin a blockwise fashion, an advantage for some searches such as whenrecombining allelic variants of multiple genes within an operon.

[0204] In some embodiments, the resulting reassembled genes aresubjected to mutagenesis before the process is repeated.

[0205] In some embodiments, oligonucleotides that incorporate uracilinto the primers are used for PCR amplification. Typically uracil isincorporated at one site in the oligonucleotide. The products aretreated with uracil glycosylase, thereby generating a single-strandedoverhang, and are reassembled in an ordered fashion by a method such asdisclosed by Rashtchian (Current Biology, 6:30-36 (1995)).

[0206] In a further embodiment, the PCR primers for amplification ofsegments of the nucleic acid sequence of interest are used to introducevariation into the gene of interest as follows. Mutations at sites ofinterest in a nucleic acid sequence are identified by screening orselection, by sequencing homologues of the nucleic acid sequence, and soon. Oligonucleotide PCR primers are then synthesized which encode wildtype or mutant information at sites of interest. These primers are thenused in PCR mutagenesis to generate libraries of full length genesencoding permutations of wild type and mutant information at thedesignated positions. This technique is typically advantagous in caseswhere the screening or selection process is expensive, cumbersome, orimpractical relative to the cost of sequencing the genes of mutants ofinterest and synthesizing mutagenic oligonucleotides.

[0207] An example of this method is the evolution of an improved Taqpolymerase, as described in detail below. Mutant proteins resulting fromapplication of the method are identified and assayed in a sequencingreaction to identify mutants with improved sequencing properties. Thisis typically done in a high throughput format (see, for example, Broachet al. Nature 384 (Supp): 14-16 (1996)) to yield, after screening, asmall number, e.g., about 2 to 100, of candidate recombinants forfurther evaluation. The mutant genes can then be sequenced to provideinformation regarding the location of the mutation. The correspondingmutagenic oligonucleotide primers can be synthesized from thisinformation, and used in a reassembly reaction as described above toefficiently generate a library with an average of many mutations pergene. Thus, multiple rounds of this protocol allows the efficient searchfor improved variants of the Taq polymerase.

[0208] C. Enrichment for Mutant Sequence Information

[0209] In some embodiments of the invention, recombination reactions,such as those discussed above, are enriched for mutant sequences so thatthe multiple mutant spectrum, i.e. possible combinations of mutations,is more efficiently sampled. The rationale for this is as follows.Assume that a number, n, of mutant clones with improved activity isobtained, wherein each clone has a single point mutation at a differentposition in the nucleic acid sequence. If this population of mutantclones with an average of one mutation of interest per nucleic acidsequence is then put into a recombination reaction, the resultingpopulation will still have an average of one mutation of interest pernucleic acid sequence as defined by a Poisson distribution, leaving themultiple mutation spectrum relatively unpopulated.

[0210] The amount of screening required to identify recombinants havingtwo or more mutations can be dramatically reduced by the followingtechnique. The nucleic acid sequences of interest are obtained from apool of mutant clones and prepared as fragments, typically by digestionwith a restriction endonuclease, sonication, or by PCR amplification.The fragments are denatured, then allowed to reanneal, therebygenerating mismatched hybrids where one strand of a mutant hashybridized with a complementary strand from a different mutant orwild-type clone. The reannealed products are then fragmented intofragments of about 20-100 bp, for example, by the use of DNAseI. Thisfragmentation reaction has the effect of segregating regions of thetemplate containing mismatches (mutant information) from those encodingwild type sequence. The mismatched hybrids can then be affinity purifiedusing aptamers, dyes, or other agents which bind to mismatched DNA. Apreferred embodiment is the use of mutS protein affinity matrix (Wagneret al., Nucleic Acids Res. 23(19):3944-3948 (1995); Su et al., Proc.Natl. Acad. Sci. (U.S.A.), 83:5057-5061(1986)) with a preferred step ofamplifying the affinity-purified material in vitro prior to an assemblyreaction. This amplified material is then put into a assembly PCRreaction as decribed above. Optionally, this material can be titratedagainst the original mutant pool (e.g., from about 100% to 10% of themutS enriched pool) to control the average number of mutations per clonein the next round of recombination.

[0211] Another application of this method is in the assembly of geneconstructs that are enriched for polymorphic bases occurring as naturalor selected allelic variants or as differences between homologous genesof related species. For example, one may have several varieties of aplant that are believed to have heritable variation in a trait ofinterest (e.g., drought resistance). It then is of interest to constructa library of these variant genes containing many mutations per gene.MutS selection can be applied in combination with the assemblytechniques described herein to generate such a pool of recombinants thatare highly enriched for polymorphic (“mutant”) information. In someembodiments, the pool of recombinant genes is provided in a transgenichost. Recombinants can be further evolved by PCR amplification of thetransgene from transgenic organisms that are determined to have animproved phenotype and applying the formats described in this inventionto further evolve them.

[0212] D. Intron-driven Recombination

[0213] In some instances, the substrate molecules for recombination haveuniformly low homology, sporadically distributed regions of homology, orthe region of homology is relatively small (for example, about 10-100bp), such as phage displayed peptide ligands. These factors can reducethe efficiency and randomness of recombination in RSR. In someembodiments of the invention, this problem is addressed by theintroduction of introns between coding exons in sequences encodingprotein homologues. In further embodiments of the invention, introns canbe used (Chong et al., J. Biol. Chem., 271:22159-22168 (1996)).

[0214] In this method, a nucleic acid sequence, such as a gene or genefamily, is arbitrarily defined to have segments. The segments arepreferably exons. Introns are engineered between the segments.Preferably, the intron inserted between the first and second segments isat least about 10% divergent from the intron inserted between second andthird segments, the intron inserted between second and third segments isat least about 10% divergent from the introns inserted between any ofthe previous segment pairs, and so on through segments n and n+1. Theintrons between any given set of exons will thus initially be identicalbetween all clones in the library, whereas the exons can be arbitrarilydivergent in sequence. The introns therefore provide homologous DNAsequences that will permit application of any of the described methodsfor RSR while the exons can be arbitrarily small or divergent insequence, and can evolve to achieve an arbitrarily large degree ofsequence divergence without a significant loss in efficiency inrecombination. Restriction sites can also be engineered into theintronic nucleic acid sequence of interest so as to allow a directedreassemmbly of restriction fragments. The starting exon DNA may besynthesized de novo from sequence information, or may be present in anynucleic acid preparation (e.g., genomic, cDNA, libraries, and so on).For example, 1 to 10 nonhomologous introns can be designed to directrecombination of the nucleic acid sequences of interest by placing thembetween exons. The sequence of the introns can be all or partly obtainedfrom known intron sequence. Preferably, the introns are self-splicing.Ordered sets of introns and exon libraries are assembled into functionalgenes by standard methods (Sambrook et al., Molecular Cloning, CSH Press(1987)).

[0215] Any of the formats for in vitro or in vivo recombinationdescribed herein can be applied for recursive exon shuffling. Apreferred format is to use nonpalindromic restriction sites such as SfiI placed into the intronic sequences to promote shuffling. Pools ofselected clones are digested with Sfi I and religated. Thenonpalindromic overhangs promote ordered reassembly of the shuffledexons. These libraries of genes can be expressed and screened fordesired properties, then subjected to further recursive rounds ofrecombination by this process. In some embodiments, the libraries aresubjected to mutagenesis before the process is repeated.

[0216] An example of how the introduction of an intron into a mammalianlibrary format would be used advantageously is as follows. An introncontaining a lox (Sauer et al., Proc. Natl. Acad. Sci. (U.S.A.),85:5166-5170 (1988)) site is arbitrarily introduced between amino acids92 and 93 in each alpha interferon parental substrate. A library of 10⁴chimeric interferon genes is made for each of the two exons (residues1-92 and residues 93-167), cloned into a replicating plasmid vector, andintroduced into target cells. The number 10⁴ is arbitrarily chosen forconvenience in screening. An exemplary vector for expression inmammalian cells would contain an SV40 origin, with the host cellsexpressing SV40 large T antigen, so as to allow transient expression ofthe interferon constructs. The cells are challenged with a cytopathicvirus such as vesicular stomatitis virus (VSV) in an interferonprotection assay (e.g., Meister et al., J. Gen. Virol. 67:1633-1643,(1986)). Cells surviving due to expression of interferon are recovered,the two libraries of interferon genes are PCR amplified, and reclonedinto a vector that can be amplified in E. coli. The amplified plasmidsare then transfected at high multiplicity (e.g. 10 micrograms of plasmidper 10⁶ cells) into a cre expressing host that can support replicationof that vector. The presence of cre in the host cells promotes efficientrecombination at the lox site in the interferon intron, thus shufflingthe selected sets of exons. This population of cells is then used in asecond round of selection by viral challenge and the process is appliedrecursively. In this format, the cre recombinase is preferrablyexpressed transiently on a cotransfected molecule that cannot replicatein the host. Thus, after segregation of recombinants from the creexpressing plasmid, no further recombination will occur and selectioncan be performed on genetically stable exon permutations. The method canbe used with more than one intron, with recombination enhancingsequences other than cre/lox (e.g., int/xis, etc.), and with othervector systems such as but not limited to retroviruses, adenovirus oradeno- associated virus.

[0217] 5. Synthetic Oligonucleotide Mediated Recombination

[0218] 1. Oligo Bridge Across Sequence Space

[0219] In some embodiments of the invention, a search of a region ofsequence space defined by a set of substrates, such as members of a genefamily, having less than about 80%, more typically, less than about 50%homology, is desired. This region, which can be part or all of a gene ora gene is arbitrarily delineated into segments. The segment borders canbe chosen randomly, based on correspondence with natural exons, based onstructural considerations (loops, alpha helices, subdomains, wholedomains, hydrophobic core, surface, dynamic simulations), and based oncorrelations with genetic mapping data.

[0220] Typically, the segments are then amplified by PCR with a pool of“bridge” oligonucleotides at each junction. Thus, if the set of fivegenes is broken into three segments A, B and C, and if there are fiveversions of each segment (A1, A2, . . . C4, C5), twenty fiveoligonucleotides are made for each strand of the A-B junctions whereeach bridge oligo has 20 bases of homology to one of the A and one ofthe B segments. In some cases, the number of required oligonucleotidescan be reduced by choosing segment boundaries that are identical in someor all of the gene family members. Oligonucleotides are similarlysynthesized for the B-C junction. The family of A domains is amplifiedby PCR with an outside generic A primer and the pool of A-B junctionoligonucleotides; the B domains with the A-B plus the B-C bridgeoligonucleotides, and the C domains with the B-C bridge oligonucleotidesplus a generic outside primer. Full length genes are made then made byassembly PCR or by the dUTP/uracil glycosylase methods described above.Preferably, products from this step are subjected to mutagenesis beforethe process of selection and recombination is repeated, until a desiredlevel of improvement or the evolution of a desired property is obtained.This is typically determined using a screening or selection asappropriate for the protein and property of interest.

[0221] An illustration of this method is illustrated below for therecombination of eleven homologous human alpha interferon genes.

[0222] 2. Site Directed Mutagenesis (SDM) with Oligonucleotides EncodingHomologue Mutations Followed by Shuffling

[0223] In some embodiments of the invention, sequence information fromone or more substrate sequences is added to a given “parental” sequenceof interest, with subsequent recombination between rounds of screeningor selection. Typically, this is done with site-directed mutagenesisperformed by techniques well known in the art (Sambrook et al.,Molecular Cloning, CSH Press (1987)) with one substrate as template andoligonucleotides encoding single or multiple mutations from othersubstrate sequences, e.g. homologous genes. After screening or selectionfor an improved phenotype of interest, the selected recombinant(s) canbe further evolved using RSR techniques described herein. Afterscreening or selection, site-directed mutagenesis can be done again withanother collection of oligonucleotides encoding homologue mutations, andthe above process repeated until the desired properties are obtained.

[0224] When the difference between two homologues is one or more singlepoint mutations in a codon, degenerate oligonucleotides can be used thatencode the sequences in both homologues. One oligo may include many suchdegenerate codons and still allow one to exhaustively search allpermutations over that block of sequence. An example of this is providedbelow for the evolution of alpha interferon genes.

[0225] When the homologue sequence space is very large, it can beadvantageous to restrict the search to certain variants. Thus, forexample, computer modelling tools (Lathrop et al., J. Mol. Biol.,255:641-665 (1996)) can be used to model each homologue mutation ontothe target protein and discard any mutations that are predicted togrossly disrupt structure and function.

[0226] F. Recombination Directed by Host Machinery

[0227] In some embodiments of the invention, DNA substrate molecules areintroduced into cells, wherein the cellular machinery directs theirrecombination. For example, a library of mutants is constructed andscreened or selected for mutants with improved phenotypes by any of thetechniques described herein. The DNA substrate molecules encoding thebest candidates are recovered by any of the techniques described herein,then fragmented and used to transfect a mammalian host and screened orselected for improved function. The DNA substrate molecules arerecovered from the mammalian host, such as by PCR, and the process isrepeated until a desired level of improvement is obtained. In someembodiments, the fragments are denatured and reannealed prior totransfection, coated with recombination stimulating proteins such asrecA, or co-transfected with a selectable marker such as Neo^(R) toallow the positive selection for cells receiving recombined versions ofthe gene of interest.

[0228] For example, this format is preferred for the in vivo affinitymaturation of an antibody by RSR. In brief, a library of mutantantibodies is generated, as described herein for the 48G7 affinitymaturation. This library is FACS purified with ligand to enrich forantibodies with the highest 0.1-10% affinity. The V regions genes arerecovered by PCR, fragmented, and cotransfected or electorporated with avector into which reassembled V region genes can recombine. DNAsubstrate molecules are recovered from the cotranfected cells, and theprocess is repeated until the desired level of improvment is obtained.Other embodiments include reassembling the V regions prior to theelectroporation so that an intact V region exon can recombine into anantibody expression cassette. Further embodiments include the use ofthis format for other eukaryotic genes or for the evolution of wholeviruses.

[0229] G. Phagemid-Based Assembly

[0230] In some embodiments of the invention, a gene of interest iscloned into a vector that generates single stranded DNA, such as aphagemid. The resulting DNA substrate is mutagenzied by RSR in anymethod known in the art, transfected into host cells, and subjected to ascreen or selection for a desired property or improved phenotype. DNAfrom the selected or screened phagemids is amplified, by, for example,PCR or plasmid preparation. This DNA preparation contains the variousmutant sequences that one wishes to permute. This DNA is fragmented anddenatured, and annealed with single-stranded DNA (ssDNA) phagemidtemplate (ssDNA encoding the wild-type gene and vector sequences). Apreferred embodiment is the use of dut(−) ung(−) host strains such asCJ236 (Sambrook et al., Molecular Cloning CSH Press (1987)) for thepreparation of ssDNA.

[0231] Gaps in annealed template are filled with DNA polymerase andligated to form closed relaxed circles. Since multiple fragments cananneal to the phagemid, the newly synthesized strand now consists ofshuffled sequences. These products are transformed into a mutS strain ofE. coli which is dut+ ung+. Phagemid DNA is recovered from thetransfected host and subjected again to this protocol until the desiredlevel of improvement is obtained. The gene encoding the protein ofinterest in this library of recovered phagemid DNA can be mutagenzied byany technique, including RSR, before the process is repeated.

[0232] III. Improved Protein Expression

[0233] While recombinant DNA technology has proved to be a very generalmethod for obtaining large, pure, and homogeneous quantities of almostall nucleic acid sequences of interest, similar generality has not yetbeen achieved for the production of large amounts of pure, homogeneousprotein in recombinant form. A likely explanation is that proteinexpression, folding, localization and stability is intrinsically morecomplex and unpredictable than for DNA. The yield of expressed proteinis a complex function of transcription rates, translation rates,interactions with the ribosome, interaction of the nascent polypeptidewith chaperonins and other proteins in the cell, efficiency ofoligomerization, interaction with components of secretion and otherprotein trafficking pathways, protease sensitivity, and the intrinsicstability of the final folded state. Optimization of such complexprocesses is well suited for the application of RSR. The followingmethods detail strategies for application of RSR to the optimization ofprotein expression.

[0234] A. Evolution of Mutant Genes with Improved Expression Using RSRon Codon Usage Libraries

[0235] The negative effect of rare E. coli codons on expression ofrecombinant proteins in this host has been clearly demonstrated(Rosenberg, et al., J. Bact. 175:716-722 (1993)). However, general rulesfor the choice of codon usage patterns to optimize expression offunctional protein have been elusive. In some embodiments of theinvention, protein expression is optimized by changing codons used inthe gene of interest, based on the degeneracy of the genetic code.Typically, this is accomplished by synthesizing the gene usingdegenerate oligonucleotides. In some embodiments the degenerateoligonucleotides have the general structure of about 20 nucleotides ofidentity to a DNA substrate molecule encoding a protein of interest,followed by a region of about 20 degenerate nucleotides which encode aregion of the protein, followed by another region of about 20nucleotides of identity. In a preferred embodiment, the region ofidentity utilizes preferred codons for the host. In a furtherembodiment, the oligonucleotides are identical to the DNA substrate atleast one 5′ and one 3′ nucleotide, but have at least 85% sequencehomology to the DNA substrate molecule, with the difference due to theuse of degenerate codons. In some embodiments, a set of such degenerateoligonucleotides is used in which each oligonucleotide overlaps withanother by the general formula n−10, wherein n is the length of theoligonucleotide. Such oligonucleotides are typically about 20-1000nucleotides in length. The assembled genes are then cloned, expressed,and screened or selected for improved expression. The assembled genescan be subjected to recursive recombination methods as descibed aboveuntil the desired improvement is achieved.

[0236] For example, this technique can be used to evolve bovineintestinal alkaline phosphatase (BIAP) for active expression in E. coli.This enzyme is commonly used as a reporter gene in assay formats such asELISA. The cloned gene cannot be expressed in active form in aprokaryotic host such as E. coli in good yield. Development of such anexpression system would allow one to access inexpensive expressiontechnology for BIAP and, importantly, for engineered variants withimproved activity or chemical coupling properties (such as chemicalcoupling to antibodies). A detailed example is provide in theExperimental Examples section.

[0237] B. Improved Folding

[0238] In some embodiments of the invention, proteins of interest whenoverexpressed or expressed in heterologous hosts form inclusion bodies,with the majority of the expressed protein being found in insolubleaggregates. Recursive sequence recombination techniques can be used tooptimize folding of such target proteins. There are several ways toimprove folding, including mutating evolving the target protein ofinterest and evolving chaperonin proteins.

[0239] 1. Evolving A Target Protein

[0240] a. Inclusion Body Fractionation Selection Using lac HeadpieceDimer Fusion Protein

[0241] The lac repressor “headpiece dimer” is a small protein containingtwo headpiece domains connected by a short peptide linker which bindsthe lac operator with sufficient affinity that polypeptide fusions tothis headpiece dimer will remain bound to the plasmid that encodes themthroughout an affinity purification process (Gates et al., J. Mol. Biol.255:373-386 (1995)). This property can be exploited, as follows, toevolve mutant proteins of interest with improved folding properties. Theprotein of interest can be mammalian, yeast, bacterial, etc.

[0242] A fusion protein between the lac headpiece dimer and a targetprotein sequence is constructed, for example, as disclosed by Gates(supra). This construct, containing at least one lac operator, ismutagenized by technologies common in the arts such as PCR mutagenesis,chemical mutagenesis, oligo directed mutagenesis (Sambrook et al.,Molecular Cloning CSH Press (1987)). The resulting library istransformed into a host cell, and expression of the fusion protein isinduced, preferably with arabinose. An extract or lysate is generatedfrom a culture of the library expressing the construct. Insolubleprotein is fractionated from soluble protein/DNA complexes bycentrifugation or affinity chromatography, and the yield of solubleprotein/DNA complexes is quantitated by quantitative PCR (Sambrook etal., Molecular Cloning, CSH Press, 1987) of the plasmid. Preferably, areagent that is specific for properly folded protein, such as amonoclonal antibody or a natural ligand, is used to purify solubleprotein/DNA complexes. The plasmid DNA from this step is isolated,subjected to RSR and again expressed. These steps are repeated until theyield of soluble protein/DNA complexes has reached a desired level ofimprovement. Individual clones are then screened for retention offunctional properties of the protein of interest, such as enzymaticactivity, etc.

[0243] This technique is generically useful for evolving solubility andother properties such as cellular trafficking of proteins heterologouslyexpressed in a host cell of interest. For example, one could select forefficient folding and nuclear localization of a protein fused to the lacrepressor headpiece dimer by encoding the protein on a plasmid encodingan SV40 origin of replication and a lac operator, and transientlyexpressing the fusion protein in a mammalian host expressing T antigen.Purification of protein/DNA complexes from nuclear HIRT extracts (Seedand Aruffo, Proc. Natl. Acad. Sci. (U.S.A.), 84:3365-3369 (1987)) wouldallow one to select for efficient folding and nuclear localizationproteins.

[0244] b. Functional Expression of Protein Using Phase Display

[0245] A problem often encountered in phage display methods such asthose disclosed by O'Neil et al. (Current Biology, 5:443-449 (1995)) isthe inability to functionally express a protein of interest on phage.Without being limited to any one theory, improper folding of the proteinof interest can be responsible for this problem. RSR can be used toevolve a protein of interest for functional expression on phage.Typically, a fusion protein is constructed between gene III or gene VIIIand the target protein and then mutagenized, for example by PCRmutagenesis. The mutagenzied library is then expressed in a phagedisplay format, a phage lysate is made, and these phage are affinityselected for those bearing functionally displayed fusion proteins usingan affinity matrix containing a known ligand for the target protein. DNAfrom the functionally selected phage is purified, and the displayedgenes of interest are shuffled and recloned into the phage displayformat. The selection, shuffling and recloning steps are repeated untilthe yield of phage with functional displayed protein has reached desiredlevels as defined, for example, by the fraction of phage that areretained on a ligand affinity matrix or the biological activityassociated with the displayed phage. Individual clones are then screenedto identify candidate mutants with improved display properties, desiredlevel of expression, and functional properties of interest (e.g.,ability to bind a ligand or receptor, lymphokine activity, enzymaticactivity, etc.).

[0246] In some embodiments of the invention, a functional screen orselection is used to identify an evolved protein not expressed on aphage. The target protein, which cannot initially be efficientlyexpressed in a host of interest, is mutagenized and a functional screenor selection is used to identify cells expressing functional protein.For example, the protein of interest may complement a function in thehost cell, cleave a colorimetric substrate, etc. Recursive sequencerecombination is then used to rapidly evolve improved functionalexpression from such a pool of improved mutants.

[0247] For example, AMV reverse transcriptase is of particularcommercial importance because it is active at a higher temperature (42°C.) and is more robust than many other reverse transcriptases. However,it is difficult to express in prokaryotic hosts such as E. coli, and isconsequently expensive because it has to be purified from chicken cells.Thus an evolved AMV reverse transcriptase that can be expressedefficiently in E. coli is highly desirable.

[0248] In brief, the AMV reverse transcriptase gene (Papas et al., J.Cellular Biochem 20:95-103 (1982)) is mutagenized by any method commonin the art. The library of mutant genes is cloned into a colE1 plasmid(Amp resistant) under control of the lac promoter in a polA12 (Ts)recA718 (Sweasy et al. Proc. Natl. Acad. Sci. U.S.A. 90:4626-4630(1993)) E. coli host. The library is induced with IPTG, and shifted tothe nonpermissive temperature. This selects for functionally expressedreverse transcriptase genes under the selective conditions reported forselection of active HIV reverse transcriptase mutants reported by Kim etal. (Proc. Natl. Acad. Sci. (U.S.A.), 92:684-688 (1995)). The selectedAMV RTX genes are recovered by PCR by using oligonucleotides flankingthe cloned gene. The resulting PCR products are subjected to in vitroRSR, selected as described above, and the process is repeated until thelevel of functional expression is acceptable. Individual clones are thenscreened for RNA-dependent DNA polymerization and other properties ofinterest (e.g. half life at room temperature, error rate). The candidateclones are subjected to mutagenesis, and then tested again to yield anAMV RT that can be expressed in E. coli at high levels.

[0249] 2. Evolved Chaperoning

[0250] In some embodiments of the invention, overexpression of a proteincan lead to the accumulation of folding intermediates which have atendency to aggregate. Without being limited to any one theory, the roleof chaperonins is thought to be to stabilize such folding intermediatesagainst aggregration; thus, overexpression of a protein of interest canlead to overwhelming the capacity of chaperoning. Chaperonin genes canbe evolved using the techniques of the invention, either alone or incombination with the genes encoding the protein of interest, to overcomethis problem.

[0251] Examples of proteins of interest which are especially suited tothis approach include but are not limited to: cytokines; malarial coatproteins; T cell receptors; antibodies; industrial enzymes (e.g.,detergent proteases and detergent lipases); viral proteins for use invaccines; and plant seed storage proteins.

[0252] Sources of chaperonin genes include but are limited to E. colichaperonin genes encoding such proteins as thioredoxin, Gro ES/Gro EL,PapD, ClpB, DsbA, DsbB, DnaJ, DnaK, and GrpE; mammalian chaperoning suchas Hsp70, Hsp72, Hsp73, Hsp40,Hsp60, Hsp10, Hdj1, TCP-1, Cpn60, BiP; andthe homologues of these chaperonin genes in other species such as yeast(J. G. Wall and A. Pluckthun, Current Biology, 6:507-516 (1995); Hartl,Nature, 381:571-580 (1996)). Additionally, heterologous genomic or cDNAlibraries can be used as libraries to select or screen for novelchaperoning.

[0253] In general, evolution of chaperonins is accomplished by firstmutagenizing chaperonin genes, screening or selecting for improvedexpression of the target protein of interest, subjecting the mutatedchaperonin genes to RSR, and repeating selection or screening. As withall RSR techniques, this is repeated until the desired improvement ofexpression of the protein of interest is obtained. Two exemplaryapproaches are provide below.

[0254] a. Chaperonin Evolution in Trans to the Protein of Interest Witha Screen or Selection for Improved Function

[0255] In some embodiments the chaperonin genes are evolvedindependently of the gene(s) for the protein of interest. Theimprovement in the evolved chaperonin can be assayed, for example, byscreening for enhancement of the activity of the target protein itselfor for the activity of a fusion protein comprising the target proteinand a selectable or screenable protein (e.g., GFP, alkaline phosphataseor beta-galactosidase).

[0256] b. Chaperonin Operon in cis

[0257] In some embodiments, the chaperonin genes and the target proteingenes are encoded on the same plasmid, but not necessarily evolvedtogether. For example, a lac headpiece dimer can be fused to the proteintarget to allow for selection of plasmids which encode soluble protein.Chaperonin genes are provided on this same plasmid (“cis”) and areshuffled and evolved rather than the target protein. Similarly, thechaperonin genes can be cloned onto a phagemid plasmid that encodes agene III or gene VIII fusion with a protein of interest. The clonedchaperonins are mutagenized and, as with the selection described above,phage expressing functionally displayed fusion protein are isolated onan affinity matrix. The chaperonin genes from these phage are shuffledand the cycle of selection, mutation and recombination are appliedrecursively until fusion proteins are efficiently displayed infunctional form.

[0258] 3. Improved Intracellular Localization

[0259] Many overexpressed proteins of biotechnological interest aresecreted into the periplasm or media to give advantages in purificationor activity assays. Optimization for high level secretion is difficultbecause the process is controlled by many genes and hence optimizationmay require multiple mutations affecting the expression level andstructure of several of these components. Protein secretion in E. coli,for example, is known to be influenced by many proteins including: asecretory ATPase (SecA), a translocase complex (SecB, SecD, SecE, SecF,and SecY), chaperonins (DnaK, DnaJ, GroES, GroEL), signal peptidases(LepB, LspA, Ppp), specific folding catalysts (DsbA) and other proteinsof less well defined function (e.g., Ffh, FtsY) (Sandkvist et al., Curr.Op. Biotechnol. 7:505-511 (1996)). Overproduction of wild type or mutantcopies of these genes for these proteins can significantly increase theyield of mature secreted protein. For example, overexpression of secY orsecY4 significantly increased the periplasmic yield of mature human IL6from a hIL6-pre-OmpA fusion (Perez-Perez et al., Bio-Technology12:178-180 (1994)). Analogously, overexpression of DnaK/DnaJ in E. coliimproved the yield of secreted human granulocyte colony stimulatingfactor (Perez-Perez et al., Biochem. Biophys. Res. Commun. 210:254-259(1995)).

[0260] RSR provides a route to evolution of one or more of the abovenamed components of the secretory pathway. The following strategy isemployed to optimize protein secretion in E. coli. Variations on thismethod, suitable for application to Bacillus subtilis, Pseudomonas,Saccaromyces cerevisiæ, Pichia pastoris, mammalian cells and other hostsare also described. The general protocol is as follows.

[0261] One or more of the genes named above are obtained by PCRamplification from E. coli genomic DNA using known flanking sequence,and cloned in an ordered array into a plasmid or cosmid vector. Thesegenes do not in general occur naturally in clusters, and hence thesewill comprise artificial gene clusters. The genes may be cloned underthe control of their natural promoter or under the control of anotherpromoter such as the lac, tac, arabinose, or trp promoters. Typically,rare restriction sites such as Sfi I are placed between the genes tofacilitate ordered reassembly of shuffled genes as described in themethods of the invention.

[0262] The gene cluster is mutagenized and introduced into a host cellin which the gene of interest can be inducibly expressed. Expression ofthe target gene to be secreted and of the cloned genes is induced bystandard methods for the promoter of interest (e.g., addition of 1 mMIPTG for the lac promoter). The efficiency of protein secretion by alibrary of mutants is measured, for example by the method of colonyblotting (Skerra et al., Anal. Biochem. 196:151-155 (1991)). Thosecolonies expressing the highest levels of secreted protein (the top0.1-10%; preferably the top 1%) are picked. Plasmid DNA is prepared fromthese colonies and shuffled according to any of the methods of theinvention.

[0263] Preferably, each individual gene is amplified from the populationand subjected to RSR. The fragments are digested with Sfi I (introducedbetween each gene with nonpalindromic overhangs designed to promoteordered reassembly by DNA ligase) and ligated together, preferably atlow dilution to promote formation of covalently closed relaxed circles(<1 ng/microliter). Each of the PCR amplified gene populations may beshuffled prior to reassembly into the final gene cluster. The ligationproducts are transformed back into the host of interest and the cycle ofselection and RSR is repeated.

[0264] Analogous strategies can be employed in other hosts such asPseudomonas, Bacillus subtilis, yeast and mammalian cells. The homologsof the E. coli genes listed above are targets for optimization, andindeed many of these homologs have been identified in other species(Pugsley, Microb. Rev. 57:50-108 (1993)). In addition to these homologs,other components such as the six polypeptides of the signal recognitionparticle, the trans-locating chain-associating membrane protein (TRAM),BiP, the Ssa proteins and other hsp70 homologs, and prsA (B. subtilis)(Simonen and Pulva, Microb. Rev. 57:109-137 (1993)) are targets foroptimization by RSR. In general, replicating episomal vectors such asSV40-neo (Sambrook et al., Molecular Cloning, CSH Press (1987), Northrupet al., J. Biol. Chem. 268(4):2917-2923 (1993)) for mammalian cells or 2micron or ars plasmids for yeast (Strathern et al., The MolecularBiology of the Yeast Saccaromyces, CSH Press (1982)) are used.Integrative vectors such as pJM 103, pJM 113 or pSGMU2 are preferred forB. subtilis (Perego, Chap. 42, pp. 615-624 in: Bacillus subtilis andOther Gram-Positive Bacteria, A. Sonenshein, J. Hoch, and R. Losick,eds., 1993).

[0265] For example, an efficiently secreted thermostable DNA polymerasecan be evolved, thus allowing the performance of DNA polymerizationassays with little or no purification of the expressed DNA polymerase.Such a procedure would be preferred for the expression of libraries ofmutants of any protein that one wished to test in a high throughputassay, for example any of the pharmaceutical proteins listed in Table I,or any industrial enzyme. Initial constructs are made by fusing a signalpeptide such as that from STII or OmpA to the amino terminus of theprotein to be secreted. A gene cluster of cloned genes believed to actin the secretory pathway of interest are mutagenized and coexpressedwith the target construct. Individual clones are screened for expresionof the gene product. The secretory gene clusters from improved clonesare recovered and recloned and introduced back into the original host.Preferably, they are first subjected to mutagenesis before the processis repeated. This cycle is repeated until the desired improvement inexpression of secreted protein is achieved.

[0266] IV. Evolved Polypeptide Properties

[0267] A. Evolved Transition State Analog and Substrate Binding

[0268] There are many enzymes of industrial interest that havesubstantially suboptimal activity on the substrate of interest. In manyof these cases, the enzyme obtained from nature is required to workeither under conditions that are very different from the conditionsunder which it evolved or to have activity towards a substrate that isdifferent from the natural substrate.

[0269] The application of evolutionary technologies to industrialenzymes is often significantly limited by the types of selections thatcan be applied and the modest numbers of mutants that can be surveyed inscreens. Selection of enzymes or catalytic antibodies, expressed in adisplay format, for binding to transition state analogs (McCafferty etal., Appl. Biochem. Biotechnol. 47:157-171 (1994)) or substrate analogs(Janda et al., Proc. Natl. Acad. Sci. (U.S.A.) 91:2532-2536, (1994))represents a general strategy for selecting for mutants with withimproved catalytic efficiency.

[0270] Phage display (O'Neil et al., Current Biology 5:443-449 (1995)and the other display formats (Gates et al., J. Mol. Biol. 255:373-386(1995); Mattheakis et al., Proc. Natl. Acad. Sci. (U.S.A.) 91:9022-9026(1994)) described herein represent general methodologies for applyingaffinity-based selections to proteins of interest. For example, Matthewsand Wells (Science 260:1113-1117 (1993)) have used phage display of aprotease substrate to select improved substrates. Display of activeenzymes on the surface of phage, on the other hand, allows selection ofmutant proteins with improved transition state analog binding.Improvements in affinity for transition state analogs correlate withimprovements in catalytic efficiency. For example, Patten et al.,Science 271:1086-1091 (1996) have shown that improvements in affinity ofa catalytic antibody for its hapten are well correlated withimprovements in catalytic efficiency, with an 80-fold improvement inkcat/Km being achieved for an esterolytic antibody.

[0271] For example, an enzyme used in antibiotic biosynthesis can beevolved for new substrate specificity and activity under desiredconditions using phage display selections. Some antibiotics arecurrently made by chemical modifications of biologically producedstarting compounds. Complete biosynthesis of the desired molecules iscurrently impractical because of the lack of an enzyme with the requiredenzymatic activity and substrate specificity (Skatrud, TIBTECH10:324-329, September 1992). For example,7-aminodeacetooxycephalosporanic acid (7-ADCA) is a precursor forsemi-synthetically produced cephalosporins. 7-ADCA is made by a chemicalring expansion of penicillin G followed by enzymatic deacylation of thephenoxyacetal group. 7-ADCA can be made enzymatically fromdeacetylcephalosporin C (DAOC V), which could in turn be derived frompenicillin V by enzymatic ring expansion if a suitably modifiedpenicillin expandase could be evolved (Cantwell et al., Curr. Genet.17:213-221 (1990)). Thus, 7-ADCA could in principle 35 be producedenzymatically from penicillin V using a modified penicillin N expandase,such as mutant forms of the S. clavuligerus cefE gene (Skatrud, TIBTECH10:324-329, September 1992). However, penicillin V is not accepted as asubstrate by any known expandase with sufficient efficiency to becommercially useful. As outlined below, RSR techniques of the inventioncan be used to evolve the penicillin expandase encoded by cefE or otherexpandases so that they will use penicillin V as a substrate.

[0272] Phage display or other display format selections are applied tothis problem by expressing libraries of cefE penicillin expandasemutants in a display format, selecting for binding to substrates ortransition state analogs, and applying RSR to rapidly evolve highaffinity binders. Candidates are further screened to identify mutantswith improved enzymatic activity on penicillin V under desired reactionconditions, such as pH, temperature, solvent concentration, etc. RSR isapplied to further evolve mutants with the desired expandase activity. Anumber of transition state analogs (TSA's) are suitable for thisreaction. The following structure is the initial TSA that is used forselection of the display library of cefE mutants:

[0273] Libraries of the known penicillin expandases (Skatrud, TIBTECH10:324-329(1992); Cantwell et al., Curr. Genet. 17:13-221 (1990)) aremade as described herein. The display library is subjected to selectionfor binding to penicillin V and/or to transition state analog givenabove for the conversion of penicillin V to DAOC V. These bindingselections may be performed under non-physiological reaction conditions,such as elevated temperature, to obtain mutants that are active underthe new conditions. RSR is applied to evolve mutants with 2-10⁵ foldimprovement in binding affinity for the selecting ligand. When thedesired level of improved binding has been obtained, candidate mutantsare expressed in a high throughput format and specific activity forexpanding penicillin V to DAOC V is quantitatively measured.Recombinants with improved enzymatic activity are mutagenized and theprocess repeated to further evolve them.

[0274] Retention of TSA binding by a displayed enzyme (e.g., phagedisplay, lac headpiece dimer, polysome display, etc.) is a goodselection for retention of the overall integrity of the active site andhence can be exploited to select for mutants which retain activity underconditions of interest. Such conditions include but are not limited to:different pH optima, broader pH optima, activity in altered solventssuch as DMSO (Seto et al., DNA Sequence 5:131-140 (1995)) or formamide(Chen et al., Proc. Natl. Acad. Sci. (U.S.A.) 90:5618-5622, (1993))altered temperature, improved shelf life, altered or broadened substratespecificity, or protease resistance. A further example, the evolution ofa p-nitrophenyl esterase, using a mammalian display format, is providedbelow.

[0275] B. Improvement of DNA and RNA Polymerases

[0276] Of particular commercial importance are improved polymerases foruse in nucleic acid sequencing and polymerase chain reactions. Thefollowing properties are attractive candidates for improvement of a DNAsequencing polymerase: (1) suppression of termination by inosine inlabelled primer format (H. Dierick et al., Nucleic Acids Res.21:4427-4428 (1993)) (2) more normalized peak heights, especially withfluorescently labelled dideoxy terminators (Parker et al., BioTechniques19:116-121 (1995)), (3) better sequencing of high GC content DNA (>60%GC) by, for example, tolerating >10% DMSO (D. Seto et al., DNA Sequence5:131-140 (1995); Scheidl et al., BioTechniques 19(5):691-694 (1995)),or (4) improved acceptance of novel base analogs such as inosine,7-deaza dGTP (Dierick et al., Nucleic Acids Res. 21:4427-4428 (1993)) orother novel base analogs that improve the above properties.

[0277] Novel sequencing formats have been described which use matrixassisted laser desorption ionization time of flight (MALDT-TOF) massspectroscopy to resolve dideoxy ladders. (Smith, Nature Biotechnology14:1084-1085 (1996)). It is noted in Smith's recent review thatfragmentation of the DNA is the singular feature limiting thedevelopment of this method as a viable alternative to standard gelelectrophoresis for DNA sequencing. Base analogs which stabilize theN-glycosidic bond by modifications of the purine bases to 7-deazaanalogs (Kirpekar et al., Rapid Comm. in Mass Spec. 9:525-531 (1995)) orof the 2′ hydroxyl (such as 2′-H or 2′-F) “relieve greatly the massrange limitation” of this technique (Smith, 1996). Thus, evolvedpolymerases that can efficiently incorporate these and other baseanalogs conferring resistance to fragmentation under MALDI-TOFconditions are valuable innovations.

[0278] Other polymerase properties of interest for improvement by RSRare low fidelity thermostable DNA polymerase for more efficientmutagenesis or as a useful correlate for acceptance of base analogs forthe purposes described above; higher fidelity polymerase for PCR(Lundberg et al., Gene 108:1-6 (1991)); higher fidelity reversetranscriptase for retroviral gene therapy vehicles to reduce mutation ofthe therapeutic construct and of the retrovirus; improved PCR of GC richDNA and PCR with modified bases (S. Turner and F. J. Jenkins,BioTechniques 19(1):48-52 (1995)).

[0279] Thus, in some embodiments of the invention, libraries of mutantpolymerase genes are screened by direct high throughput screening forimproved sequencing properties. The best candidates are then subjectedto RSR. Briefly, mutant libraries of candidate polymerases such as Taqpolymerase are constructed using standard methods such as PCRmutagenesis (Caldwell et al., PCR Meth. App. 2:28-33 (1992)) and/orcassette mutagenesis (Sambrook et al., Molecular Cloning, CSH Press(1987)). Incorporation of mutations into Taq DNA polymerase such as theactive site residue from T7 polymerase that improves acceptance ofdideoxy nucleotides (Tabor and Richardson, J. Biol. Chem. 265:8322-8328(1990)) and mutations that inactivate the 5′-3′ exonuclease activity (R.S. Rano, BioTechniques 18:390-396 (1995)) are incorporated into theselibraries. The reassembly PCR technique, for example, as described aboveis especially suitable for this problem. Similarly, chimeric polymeraselibraries are made by breeding existing thermophilic polymerases,sequenase, and E. coli polI with each other using the bridgeoligonucleotide methods described above. The libraries are expressed informats wherein human or robotic colony picking is used to replica pickindividual colonies into 96 well plates where small cultures are grown,and polymerase expression is induced.

[0280] A high throughput, small scale simple purification for polymeraseexpressed in each well is performed. For example, simple single-steppurifications of His-tagged Taq expressed in E. coli have been described(Smirnov et al., Russian J. Bioorganic Chem. 21(5):341-342 (1995)), andcould readily be adapted for a 96-well expression and purificationformat.

[0281] A high throughput sequencing assay is used to perform sequencingreactions with the purified samples. The data is analyzed to identifymutants with improved sequencing properties, according to any of thesecriteria: higher quality ladders on GC-rich templates, especiallygreater than 60% GC, including such points as fewer artifactualtermination products and stronger signals than given with the wild-typeenzyme; less termination of reactions by inosine in primer labelledreactions, e.g., fluorescent labelled primers; less variation inincorporation of signals in reactions with fluorescent dideoxynucleotides at any given position; longer sequencing ladders thanobtained with the wild-type enzyme, such as about 20 to 100 nucleotides;improved acceptance of other known base analogs such as 7-deaza purines;improved acceptance of new base analogs from combinatorial chemistrylibraries (See, for example, Hogan, Nature 384(Supp):17-1996).

[0282] The best candidates are then subjected to mutagenesis, and thenselected or screened for the improved sequencing properties decribedabove.

[0283] In another embodiment, a screen or selection is performed asfollows. The replication of a plasmid can be placed under obligatecontrol of a polymerase expressed in E. coli or another microorganism.The effectiveness of this system has been demonstrated for makingplasmid replication dependent on mammalian polymerase beta (Sweasy etal., Proc. Natl. Acad. Sci. (U.S.A.) 90:4626-4630, (1993)), Taqpolymerase (Suzuki et al., Proc. Natl. Acad. Sci. (U.S.A.) 93:9670-9675(1996)), or HIV reverse transcriptase (Kim et al., Proc. Natl. Acad.Sci. (U.S.A.) 92:684-688 (1995)). The mutant polymerase gene is placedon a plasmid bearing a colE1 origin and expressed under the control ofan arabinose promoter. The library is enriched for active polymerasesessentially as described by Suzuki et al., (supra), with polymeraseexpression being induced by the presence of arabinose in the culture.

[0284] A further quantitative screen utilizes the presence of GFP (greenfluorescence protein) on the same plasmid, replica plating ontoarabinose at the nonpermissive temperature in the absence of a selectiveantibiotic, and using a fluorimeter to quantitatively measurefluorescence of each culture. GFP activity correlates with plasmidstability and copy number which is in turn dependent on expression ofactive polymerase.

[0285] A polymerase with a very high error rate would be a superiorsequencing enzyme, as it would have a more normalized signal forincorporation of base analogs such as the currently used fluorescentlylabelled dideoxies because it will have reduced specificity andselectivity. The error rates of currently used polymerases are on theorder of 10⁻⁵ to 10⁻⁶, orders of magnitude lower than what can bedetected given the resolving power of the gel systems. An error rate of1%, and possibly as high as 10%, could not be detected by current gelsystems, and thus there is a large window of opportunity to increase the“sloppiness” of the enzyme. An error-prone cycling polymerase would haveother uses such as for hypermutagenesis of genes by PCR.

[0286] In some embodiments, the system described by Suzuki (Suzuki etal., Proc. Natl. Acad. Sci. (U.S.A.) 96:9670-9675 (1996)) is used tomake replication of a reporter plasmid dependent on the expressedpolymerase. This system puts replication of the first 200-300 bases nextto the ColE1 origin directly under the control of the expressedpolymerase (Sweasy and Loeb, J. Bact. 177:2923-2925 (1995); Sweasy etal., Proc. Natl. Acad. Sci. (U.S.A.) 90:4626-4630 (1993)). A screenableor selectable reporter gene containing stop codons is positioned in thisregion, such as LacZ alpha containing one, two or three stop codons. Theconstructs are grown on arabinose at the nonpermissible temperature,allowed to recover, and plated on selective lactose minimal media thatdemands reversion of the stop codons in the reporter cassette. Mutantpolymerases are recovered from the survivors by PCR. The survivors areenriched for mutators because their mutator phenotype increases the rateof reversion of stop codons in the reporter lacZ alpha fragment.

[0287] The polymerase genes from the survivors are subjected to RSR,then the polymerase mutants are retransformed into the indicator strain.Mutators can be visually screened by plating on arabinose/Xgal plates atthe nonpermissive temperature. Mutator polymerases will give rise tocolonies with a high frequency of blue papillae due to reversion of thestop codon(s). Candidate papillators can be rescreened by picking anon-papillating region of the most heavily papillated colonies (i.e,“best” colonies) and replating on the arabinose/Xgal indicator medium tofurther screen for colonies with increased papillation rates. Thesesteps are repeated until a desired reversion rate is achieved (e.g.,10⁻² to 10⁻³ mutations per base pair per replication).

[0288] Colonies which exhibit high frequency papillation are candidatesfor encoding an error prone polymerase. These candidates are screenedfor improved sequencing properties: essentially as for the highthroughput screen described above. Briefly, mutant Taq proteins areexpressed and purified in a 96-well format. The purified proteins areused in sequencing reactions and the sequence data are analyzed toidentify mutants that exhibit the improvements outlined herein. Mutantswith improved properties are subjected to RSR and rescreened for furtherimprovements in function.

[0289] In some embodiments, GFP containing stop codons instead of lacZalpha with stop codons is used for the construction. Cells with revertedstop codons in GFP are selected by fluorescence activated cell sorter(FACS). In general, FACS selection is performed by gating the brightestabout 0.1-10%, preferably the top 0.1 to 1%, and collected according toa protocol similar to that of Dangl et al., (Cytometry 2(6):395-401(1982)). In other embodiments, the polA gene is flanked with lox sitesor other targets of a site specific recombinase. The recombinase isinduced, thus allowing one to inducibly delete the polA gene (Mulbery etal., Nucleic Acid Res. 23:485-490 (1995)) This would allow one toperform “Loeb-type” selections at any temperature and in any host. Forexample, one could set up such a selection in a recA deficient mesophileor thermophile by placing the polA homologue in an inducibly deletableformat and thus apply the selection for active polymerase under moregeneral conditions.

[0290] In further embodiments, this general system is preferred fordirected in vivo mutagenesis of genes. The target gene is cloned intothe region near a plasmid origin of replication that puts itsreplication obligately under control of the error prone polymerase. Theconstruct is passaged through a polA(ts) recA strain and grown at thenonpermissive temperature, thus specifically mutagenizing the targetgene while replicating the rest of the plasmid with high fidelity.

[0291] In other embodiments, selection is based on the ability of mutantDNA polymerases to PCR amplify DNA under altered conditions or byutilizing base analogs. The mutant polymerases act on the template thatencodes them in a PCR amplification, thus differentially replicatingthose polymerases.

[0292] In brief, an initial library of mutants is replica plated.Polymerase preparations are done in a 96-well format. Crude plasmidpreparations are made of the same set. Each plasmid prep isPCR-amplified using the polymerase prep derived from that plasmid underthe conditions for which one wishes to optimize the polymerase (e.g.,added DMSO or formamide, altered temperature of denaturation orextension, altered buffer salts, PCR with base analogs such a-thioldNTP's for use with mass spectroscopy sequencing, PCR of GC rich DNA(>60% GC),PCR with novel base analogs such as 7-deaza purines, 2′ fluorodNTP's, rNTP's, PCR with inosine, etc.). The amplified genes are pooled,cloned,and subjected to mutagenesis, and the process repeated until animprovement is achieved.

[0293] C. Evolved Phosphonatase

[0294] Alkaline phosphatase is a widely used reporter enzyme for ELISAassays, protein fusion assays, and in a secreted form as a reporter genefor mammalian cells. The chemical lability of p-nitrophenyl phosphate(pNPP) substrates and the existence of cellular phosphatases thatcross-react with pNPP is an important limitation on the sensitivity ofassays using this reporter gene. A reporter gene with superior signal tonoise properties can be developed based on hydrolysis of p-nitrophenylphosphonates, which are far more stable to base catalyzed hydrolysisthan the corresponding phosphates. Additionally, there are far fewernaturally occurring cellular phosphonatases than alkaline phosphatases.Thus a p-nitrophenyl phosphonatase is an attractive replacement foralkaline phosphatase because the background due to chemical andenzymatic hydrolysis is much lower. This will allow one to make ELISA'smore sensitive for detecting very small concentrations of antigen.

[0295] Chen et al. (J. Mol. Biol. 234:165-178 (1993)) have shown that aStaph. aureus beta-lactamase can hydrolyze p-nitrophenyl phosphonateesters with single turnover kinetics. The active site Ser70 (the activesite nucleophile for beta lactam hydrolysis) forms a covalentintermediate with the substrate. This is analogous to the first step inhydrolysis of beta lactams, and this enzyme can be evolved by RSR tohydrolyze phosphonates by a mechanism analogous to beta lactamhydrolysis. Metcalf and Wanner have described a cryptic phosphonateutilizing operon (phn) in E. coli, and have constructed strains bearingdeletions of the phn operon (J. Bact. 175:3430-3442 (1993)). This paperdiscloses selections for growth of E. coli on phosphate free minimalmedia where the phosphorous is derived from hydrolysis of alkylphosphonates by genes in the phn operon. Thus, one could select forevolved p-nitrophenyl phosphonatases that are active using biochemicalselections on defined minimal media. Specifically, an efficientphosphonatase is evolved as follows. A library of mutants of the Staph.aureus beta lactamase or of one of the E. coli phn enzymes isconstructed. The library is transformed into E. coli mutants wherein thephn operon has been deleted, and selected for growth on phosphate freeMOPS minimal media containing p-nitrophenyl phosphonate. RSR is appliedto selected mutants to further evolve the enzyme for improved hydrolysisof p-nitrophenyl phosphonates.

[0296] D. Evolved Detergent Proteases

[0297] Proteases and lipases are added in large quantities to detergentsto enzymatically degrade protein and lipid stains on clothes. Theincorporation of these enzymes into detergents has significantly reducedthe need for surfactants in detergents with a consequent reduction inthe cost of formulation of detergents and improvement in stain removalproperties. Proteases with improved specific activity, improved range ofprotein substrate specificity, improved shelf life, improved stabilityat elevated temperature, and reduced requirements for surfactants wouldadd value to these products.

[0298] As an example, subtilisin can be evolved as follows. The clonedsubtilisin gene (von der Osten et al., J. Biotechnol. 28:55-68 (1993))can be subjected to RSR using growth selections on complex protein mediaby virtue of secreted subtilisin degrading the complex protein mixture.More specifically, libraries of subtilisin mutants are constructed in anexpression vector which directs the mutant protein to be secreted byBacillus subtilus. Bacillus hosts transformed with the libraries aregrown in minimal media with complex protein formulation as carbon and/ornitrogen source. Subtilisin genes are recovered from fast growers andsubjected to RSR, then screened for improvement in a desired property.

[0299] E. Escape of Phage from a “Protein Net”

[0300] In some embodiments, selection for improved proteases isperformed as follows. A library of mutant protease genes is constructedon a display phage and the phage grown in a multiwell format or onplates. The phage are overlayed with a “protein net” which ensnares thephage. The net can consist of a protein or proteins engineered withsurface disulphides and then crosslinked with a library of peptidelinkers. A further embodiment employs an auxiliary matrix to furthertrap the phage. The phage are further incubated, then washed to collectliberated phage wherein the displayed protease was able to liberate thephage from the protein net. The protease genes are then subjected to RSRfor further evolution. A further embodiment employs a library ofproteases encoded by but not displayed on a phagemid whereinstreptavidin is fused to pIII by a peptide linker. The library ofprotease mutants is evolved to cleave the linker by selecting phagemidson a biotin column between rounds of amplification.

[0301] In a further embodiment, the protease is not necessarily providedin a display format. The host cells secrete the protease encoded by butnot surface diplayed by a phagemid, while constrained to a well, forexample, in a microtiter plate. Phage display format is preferred wherean entire high titre lysate is encased in a protein net matrix, and thephage expressing active and broad specificity proteases digesting thematrix to be liberated for the next round of amplification, mutagenesis,and selection.

[0302] In a further embodiment, the phage are not constrained to a wellbut, rather, protein binding filters are used to make a colony of plaquelifts and are screened for activity with chromogenic or fluorogenicsubstrates. Colonies or plaques corresponding to positive spots on thefilters are picked and the encoded protease genes are recovered by, forexample, PCR. The protease genes are then subjected to RSR for furtherevolution.

[0303] F. Screens for Improved Protease Activity

[0304] Peptide substrates containing fluoropores attached to the carboxyterminus and fluorescence quenching moities on the amino terminus, suchas those described by Holskin, et al, (Anal. Biochem. 227:148-55 (1995))(e.g.,(4-4′-dimethylaminophenazo)benzoyl-arg-gly-val-val-asn-ala-ser-ser-arg-leu-ala-5-(2′-aminoethyl)-amino]-naphthalene-1-sulfonicacid) are used to screen protease mutants for broadened or alteredspecificity. In brief, a library of peptide substrates is designed witha flourophore on the amino terminus and a potent fluorescence quencheron the carboxy terminus, or vice versa. Supernatants containing secretedproteases are incubated either separately with various members of thelibrary or with a complex cocktail. Those proteases which are highlyactive and have broad specificity will cleave the majority of thepeptides, thus releasing the fluorophore from the quencher and giving apositive signal on a fluorimeter. This technique is amenable to a highdensity multiwell format.

[0305] G. Improving Pharmaceutical Proteins Using RSR

[0306] Table I lists proteins that are of particular commercial interestto the pharmaceutical industry. These proteins are all candidates forRSR evolution to improve function, such as ligand binding, shelf life,reduction of side effects through enhanced specificity, etc. All arewell-suited to manipulation by the techniques of the invention.Additional embodiments especially applicable to this list are describedbelow.

[0307] First, high throughput methods for expressing and purifyinglibraries of mutant proteins, similar to the methods described above forTaq polymerase, are applied to the proteins of Table I. These mutantsare screened for activity in a functional assay. For example, mutants ofIL2 are screened for resistance to plasma or tissue proteases withretention of activity for the low affinity IL2 receptor but with loss ofactivity on the high affinity IL2 receptor. The genes from mutants withimproved activity relative to wild-type are recovered, and subjected toRSR to improve the phenotype further.

[0308] Preferably, the libraries are generated in a display format suchthat the mature folded protein is physically linked to the geneticinformation that encodes it. Examples include phage display usingfilamentous phage (O'Neil et al., Current Biology 5:443-449 (1995)) orbacteriophage lambda gene V display (Dunn, J. Mol. Biol. 248:497-506(1995)), peptides on plasmids (Gates et al., J. Mol. Biol. 255:373-386(1995)) where the polypeptide of interest is fused to a lac headpiecedimer and the nascent translation product binds to a lac operator siteencoded on the plasmid or PCR product, and polysome display (Mattheakiset al., Proc. Natl. Acad. Sci. (U.S.A.) 91:9022-9026 (1994)) whereribosomes are stalled on mRNA molecules such that the nascentpolypeptide is exposed for interaction with cognate ligands withoutdisrupting the stalled ribosome/mRNA complex. Selected complexes aresubjected to RT-PCR to recover the genes.

[0309] When so displayed, affinity binding of the recombinant phage isoften done using a receptor for the protein of interest. In some casesit is impractical to obtain purified receptor with retention of alldesired biological characteristics (for example, 7-transmembrane (7-TM)receptors). In such cases, one could use cells expressing the receptoras the panning substrate. For example, Barry et al. (Nat. Med. 2:299-305(1996)) have described successful panning of M13 libraries against wholecells to obtain phage that bind to the cells expressing a receptor ofinterest. This format could be generally applied to any of the proteinslisted in Table I.

[0310] In some embodiments, the following method can be used forselection. A lysate of phage encoding IFN alpha mutants, for example,can be used directly at suitable dilution to stimulate cells with a GFPreporter construct (Crameri et al., Nat. Med. 14:315-319 (1996)) underthe control of an IFN responsive promoter, such as an MHC class Ipromoter. Phage remaining attached after stimulation, expression andFACS purification of the responsive cells, can be purified by FACS.Preferably, the brightest cells are collected. The phage are collectedand their DNA subjected to RSR until the level of desired improvement isachieved.

[0311] Thus, for example, IL-3 is prepared in one of these displayformats and subjected to RSR to evolve an agonist with a desired levelof activity. A library of IL3 mutants on a filamentous phage vector iscreated and affinity selected (“panned”) against purified IL3 receptorto obtain mutants with improved affinity. The mutant IL-3 genes arerecovered by PCR, subjected to RSR, and recloned into the displayvector. The cycle is repeated until the desired affinity or agonistactivity is achieved.

[0312] Many proteins of interest are expressed as dimers or higher ordermultimeric forms. In some embodiments, the display formats descibedabove preferentially are applied to a single chain version of theprotein. Mutagenesis, such as RSR, can be used in these display formatsto evolve improved single chain derivatives of multimeric factors whichinitially have low but detectable activity. This strategy is describedin more detail below.

[0313] H. Whole Cell Selections

[0314] In some embodiments, the eukaryotic cell is the unit ofbiological selection. The following general protocol can be used toapply RSR to the improvement of proteins using eukaryotic cells as theunit of selection: (1) transfection of libraries of mutants into asuitable host cell, (2) expression of the encoded gene product(s) eithertransiently or stably, (3) functional selection for cells with animproved phenotype (expression of a receptor with improved affinity fora target ligand; viral resistance, etc., (4) recovery of the mutantgenes by, for example, PCR followed by preparation of HIRT supernatantswith subsequent tranformation of E. coli, (5) RSR and (6) repetition ofsteps (1)-(5) until the desired degree of improvement is achieved.

[0315] For example, previous work has shown that one can use mammaliansurface display to functionally select cells expressing cloned genes,such as using an antibody to clone the gene for an expressed surfaceprotein (Reviewed by Seed, Curr. Opin. Biotechnol. 6:567-573 (1995)).Briefly, cells are transiently transfected with libraries of clonedgenes residing on replicating episomal vectors. An antibody directedagainst the protein of interest (whose gene one wishes to clone) isimmobilized on a solid surface such as a plastic dish, and thetransfected cells expressing the protein of interest are affinityselected.

[0316] For example, the affinity of an antibody for a ligand can beimproved using mammalian surface display and RSR. Antibodies with higheraffinity for their cognate ligands are then screened for improvement ofone or more of the following properties: (1) improved therapeuticproperties (increased cell killing, neutralization of ligands,activation of signal transduction pathways by crosslinking receptors),(2) improved in vivo imaging applications (detection of the antibody bycovalent/noncovalent binding of a radionuclide or any agent detectableoutside of the body by noninvasive means, such as NMR), (3) improvedanalytical applications (ELISA detection of proteins or smallmolecules), and (4) improved catalysts (catalytic antibodies). Themethods described are general and can be extended to any receptor-ligandpair of interest. A specific example is provided in the experimentalsection.

[0317] The use of a one mutant sequence-one transfected cell protocol isa preferred design feature for RSR based protocols because the point isto use functional selection to identify mutants with improved phenotypesand, if the transfection is not done in a “clonal” fashion, thefunctional phenotype of any given cell is the result of the sum of manytransfected sequences. Protoplast fusion is one method to achieve thisend, since each protoplast contains typically greater than 50 copieseach of a single plasmid variant. However, it is a relatively lowefficiency process (about 10³-10⁴ transfectants), and it does not workwell on some non-adherent cell lines such as B cell lines. Retroviralvectors provide a second alternative, but they are limited in the sizeof acceptable insert (<10 kb) and consistent, high expression levels aresometimes difficult to achieve. Random integration results in varyingexpression levels, thus introducing noise and limiting one's ability todistinguish between improvements in the affinity of the mutant proteinvs. increased expression. A related class of strategies that can be usedeffectively to achieve “one gene-one cell” DNA transfer and consistentexpression levels for RSR is to use a viral vector which contains a loxsite and to introduce this into a host that expresses cre recombinase,preferably transiently, and contains one or more lox sites integratedinto its genome, thus limiting the variability of integration sites(Rohlman et al. Nature Biotech. 14:1562-1565 (1996)).

[0318] An alternative strategy is to transfect with limitingconcentrations of plasmid (i.e., about one copy per cell) using a vectorthat can replicate in the target cells, such as is the case withplasmids bearing SV40 origins transfected into COS cells. This strategyrequires that either the host cell or the vector supply a replicationfactor such as SV40 large T antigen. Northrup et al. (J. Biol. Chem.268:2917-2923 (1993)) describe a strategy wherein a stable transfectantexpressing SV40 large T antigen is then transfected with vectors bearingSV40 origins. This format gave consistently higher transient expressionand demonstrable plasmid replication, as assayed by sensitivity todigestion by Dpn I. Transient expression (i.e, non-integrating plasmids)is a preferred format for cellular display selections because it reducesthe cycle time and increases the number of mutants that can be screened.

[0319] The expression of SV40 large T antigen or other replicationfactors may have deleterious effects on or may work inefficiently insome cells. In such cases, RSR is applied to the replication factoritself to evolve mutants with improved activity in the cell type ofinterest. A generic protocol for evolving such a factor is as follows:

[0320] The target cell is transfected with GFP cloned onto a vectorcontaining SV40 large T antigen, an SV40 origin, and a reporter genesuch as GFP; a related format is cotransfection with limiting amounts ofthe SV40 large T antigen expression vector and an excess of a reportersuch as GFP cloned onto an SV40 origin containing plasmid. Typicallyafter 1-10 days of transient expression, the brightest cells arepurified by FACS. SV40 large T antigen mutants are recovered by PCR, andsubjected to mutagenesis. The cycle is repeated until the desired levelof improvement is obtained.

[0321] I. Autocrine Selection

[0322] In some embodiments, mutant proteins are selected or screenedbased on their ability to exert a biological effect in an autocrinefashion on the cell expressing the mutant protein. For example, alibrary of alpha interferon genes can be selected for induction of morepotent or more specific antiviral activity as follows. A library ofinterferon alpha mutants is generated in a vector which allows forinduction of expression (i.e. under control of a metallothioneinpromoter) and efficient secretion in a multiwell format (96-well forexample) with one or a few independent clones per well. In someembodiments, the promoter is not inducible,and may be constitutive.

[0323] Expression of the cloned interferon genes is induced. The cellsare challenged with a cytotoxic virus against which one wishes to evolvean optimized interferon (for example vesicular stomatitus virus or HIV).Surviving cells are recovered. The cloned interferon genes are recoveredby PCR amplification, subjected to RSR, and cloned back into thetransfection vector and retransfected into the host cells. These stepsare repeated until the desired level of antiviral activity is evolved.

[0324] In some embodiments, the virus of interest is not stronglycytotoxic. In this case a conditionally lethal gene, such as herpessimplex virus thymidine kinase, is cloned into the virus and afterchallenge with virus and recovery, conditionally lethal selectiveconditions are applied to kill cells that are infected with virus. Anexample of a conditionally lethal gene is herpes TK, which becomeslethal upon treating cells expressing this gene with the thymidineanalog acyclovir. In some embodiments, the antiproliferative activity ofthe cloned interferons is selected by treating the cells with agentsthat kill dividing cells (for example, DNA alkylating agents).

[0325] In some embodiments, potent cytokines are selected by expressingand secreting a library of cytokines in cells that have GFP or anotherreporter under control of a promoter that is induced by the cytokine,such as the MHC class I promoter being induced by evolved variants ofalpha interferon. The signal transduction pathway is configured suchthat the wild type cytokine to be evolved gives a weak but detectablesignal.

[0326] J. Half Life in Serum

[0327] In some embodiments of the invention, proteins are evolved by RSRto have improved half life in serum. A preferred method for improvinghalf-life is evolving the affinity of a protein of interest for a longlived serum protein, such as an antibody or other abundant serumprotein. Examples of how affinity for an antibody can enhance serum halflife include the co-administration of IL2 and anti-IL2 antibodies whichincreases serum half-life and anti-tumor activity of human recombinantIL2 (Courtney et al., Immunopharmacology 28:223-232 (1994)).

[0328] The eight most abundant human serum proteins are serum albumin,immunoglobulins, lipoproteins, haptoglobin, fibrinogen, transferrin,alpha-1 antitrypsin, and alpha-2 macroglobulin (Doolittle, chapter 6,The Plasma Proteins F. Putnam, ed.; Academic Press, 1984). These andother abundant serum proteins such as ceruloplasmin and fibronectin arethe primary targets against which to evolve binding sites on therapeuticproteins such as in Table I for the purpose of extending half-life. Inthe case of antibodies, the preferred strategy is to evolve affinity forconstant regions rather than variable regions in order to minimizeindividual variation in the concentration of the relevant target epitope(antibody V region usage between different individuals is significantlyvariable).

[0329] Binding sites of the desired affinity are evolved by applyingphage display, peptides on plasmid display or polysome displayselections to the protein of interest. One could either mutagenize anexisting binding site or otherwise defined region of the target protein,or append a peptide library to the N terminus, C terminus, or internallyas a functionally nondisruptive loop.

[0330] In other embodiments of the invention, half life is improved byderivatization with PEG, other polymer conjugates or half-life extendingchemical moieties. These are established methods for extending half-lifeof therapeutic proteins (R. Duncan, Clin. Pharmacokinet 27:290-306(1994); Smith et al., TIBTECH 11 397-403 (1993)) and can have the addedbenefit of reducing immunogenicity (R. Duncan, Clin. Pharmacokinet27:290-306 (1994)). However, derivatization can also result in reducedaffinity of the therapeutic protein for its receptor or ligand. RSR isused to discover alternative sites in the primary sequence that can besubstituted with lysine or other appropriate residues for chemical orenzymatic conjugation with half-life extending chemical moieties, andwhich result in proteins with maximal retention of biological activity.

[0331] A preferred strategy is to express a library of mutants of theprotein in a display format, derivatize the library with the agent ofinterest (i.e. PEG) using chemistry that does not biologicallyinactivate the display system, select based on affinity for the cognatereceptor, PCR amplify the genes encoding the selected mutants, shuffle,reassemble, reclone into the display format, and iterate until a mutantwith the desired activity post modification is obtained. An alternativeformat is to express, purify and derivatize the mutants in a highthroughput format, screen for mutants with optimized activity, recoverthe corresponding genes, subject the genes to RSR and repeat.

[0332] In further embodiments of the invention, binding sites for targethuman proteins that are localized in particular tissues of interest areevolved by RSR. For example, an interferon that localizes efficiently tothe liver can be engineered to contain a binding site for a liversurface protein such as hepatocyte growth factor receptor. Analogously,one could evolve affinity for abundant epitopes on erythrocytes such asABO blood antigens to localize a given protein to the blood stream.

[0333] In further embodiments of the invention, the protein of interestis evolved to have increased stability to proteases. For example, theclinical use of IL2 is limited by serious side effects that are relatedto the need to administer high doses. High doses are required due to theshort half life (3-5 min, Lotze et al., JAMA 256(22):3117-3124 (1986))and the consequent need for high doses to maintain a therapeutic levelof IL2. One of the factors contributing to short half-lives oftherapeutic proteins is proteolysis by serum proteases. Cathepsin D, amajor renal acid protease, is responsible for the degradation of IL2 inBalb/c mice (Ohnishi et al., Cancer Res. 50:1107-1112 (1990)).Furthermore, Ohnishi showed that treatment of Balb/c mice withpepstatin, a potent inhibitor of this protease, prolongs the half lifeof recombinant human IL2 and augments lymphokine-activated killer cellactivity in this mouse model.

[0334] Thus, evolution of protease resistant variants of IL2 or any ofthe proteins listed in Table I that are resistant to serum or kidneyproteases is a preferred strategy for obtaining variants with extendedserum half lives.

[0335] A preferred protocol is as follows. A library of the mutagenizedprotein of interest is expressed in a display system with a gene-distalepitope tag (i.e. on the N-terminus of a phage display construct suchthat if it is cleaved off by proteases, the epitope tag is lost). Theexpressed proteins are treated with defined proteases or with complexcocktails such as whole human serum. Affinity selection with an antibodyto the gene distal tag is performed. A second selection demandingbiological function (e.g., binding to cognate receptor) is performed.Phage retaining the epitope tag (and hence protease resistant) arerecovered and subjected to RSR. The process is repeated until thedesired level of resistance is attained.

[0336] In other embodiments, the procedure is performed in a screeningformat wherein mutant proteins are expressed and purified in a highthroughput format and screened for protease resistance with retention ofbiological activity.

[0337] In further embodiments of the invention, the protein of interestis evolved to have increased shelf life. A library of the mutagenizednucleic acid squence encoding the protein of interest is expressed in adisplay format or high throughput expression format, and exposed forvarious lengths of time to conditions for which one wants to evolvestability (heat, metal ions, nonphysiological pH of, for example, <6or >8, lyophilization, freeze-thawing). Genes are recovered from fromsurvivors, for example, by PCR. The DNA is subjected to mutagenesis,such as RSR, and the process repeated until the desired level ofimprovement is achieved.

[0338] K. Evolved Single Chain Versions of Multisubunit Factors

[0339] As discussed above, in some embodiments of the invention, thesubstrate for evolution by RSR is preferably a single chain contruction.The possibility of performing asymetric mutagenesis on constructs ofhomomultimeric proteins provides important new pathways for furtherevolution of such constructs that is not open to the proteins in theirnatural homomultimeric states. In particular, a given mutation in ahomomultimer will result in that change being present in each identicalsubunit. In single chain constructs, however, the domains can mutateindependently of each other.

[0340] Conversion of multisubunit proteins to single chain constructswith new and useful properties has been demonstrated for a number ofproteins. Most notably, antibody heavy and light chain variable domainshave been linked into single chain Fv's (Bird et al., Science242:423-426 (1988)), and this strategy has resulted in antibodies withimproved thermal stability (Young et al., FEBS Lett 377:135-139 (1995)),or sensitivity to proteolysis (Solar et al., Prot. Eng. 8:717-723(1995)). A functional single chain version of IL5, a homodimer, has beenconstructed, shown to have affinity for the IL5 receptor similar to thatof wild type protein, and this construct has been used to performassymetric mutagenesis of the dimer (Li et al., J. Biol. Chem.271:1817-1820 (1996)). A single chain version of urokinase-typeplasminogen activator has been made, and it has been shown that thesingle chain construct is more resistant to plasminogen activatorinhibitor type 1 than the native homodimer (Higazi et al., Blood87:3545-3549 (1996)). Finally, a single-chain insulin-like growth factorI/insulin hybrid has been constructed and shown to have higher affinityfor chimeric insulin/IGF-1 receptors than that of either natural ligand(Kristensen et al., Biochem. J. 305:981-986 (1995)).

[0341] In general, a linker is constructed which joins the aminoterminus of one subunit of a protein of interest to the carboxylterminus of another subunit in the complex. These fusion proteins canconsist of linked versions of homodimers, homomultimers, heterodimers orhigher order heteromultimers. In the simplest case, one adds polypeptidelinkers between the native termini to be joined. Two significantvariations can be made. First, one can construct diverse libraries ofvariations of the wild type sequence in and around the junctions and inthe linkers to facilitate the construction of active fusion proteins.Secondly, Zhang et al., (Biochemistry 32:12311-12318 (1993)) havedescribed circular permutations of T4 lysozyme in which the native aminoand carboxyl termini have been joined and novel amino and carboxyltermini have been engineered into the protein. The methods of circularpermutation, libraries of linkers, and libraries of junctional sequencesflanking the linkers allow one to construct libraries that are diversein topological linkage strategies and in primary sequence. Theselibraries are expressed and selected for activity. Any of the abovementioned strategies for screening or selection can be used, with phagedisplay being preferable in most cases. Genes encoding active fusionproteins are recovered, mutagenized, reselected, and subjected tostandard RSR protocols to optimize their function. Preferably, apopulation of selected mutant single chain constructs is PCR amplifiedin two seprate PCR reactions such that each of the two domains isamplified separately. Oligonucleotides are derived from the 5′ and 3′ends of the gene and from both strands of the linker. The separatelyamplified domains are shuffled in separate reactions, then the twopopulations are recombined using PCR reassembly to generate intactsingle chain constructs for further rounds of selection and evolution.

[0342] V. Improved Properties of Pharmaceutical Proteins

[0343] A. Evolved Specificity for Receptor or Cell Type of Interest

[0344] The majority of the proteins listed in Table I are eitherreceptors or ligands of pharmaceutical interest. Many agonists such aschemokines or interleukins agonize more than one receptor. Evolvedmutants with improved specificity may have reduced side effects due totheir loss of activity on receptors which are implicated in a particularside effect profile. For most of these ligand/receptors, mutant formswith improved affinity would have improved pharmaceutical properties.For example, an antagonistic form of RANTES with improved affinity forCKR5 should be an improved inhibitor of HIV infection by virtue ofachieving greater receptor occupancy for a given dose of the drug. Usingthe selections and screens outlined above in combination with RSR, theaffinities and specificities of any of the proteins listed in Table Ican be improved. For example, the mammalian display format could be usedto evolve TNF receptors with improved affinity for TNF.

[0345] Other examples include evolved interferon alpha variants thatarrest tumor cell proliferation but do not stimulate NK cells, IL2variants that stimulate the low affinity IL2 receptor complex but notthe high affinity receptor (or vice versa), superantigens that stimulateonly a subset of the V beta proteins recognized by the wild type protein(preferably a single V beta), antagonistic forms of chemokines thatspecifically antagonize only a receptor of interest, antibodies withreduced cross-reactivity, and chimeric factors that specificallyactivate a particular receptor complex. As an example of this lattercase, one could make chimeras between IL2 and IL4, 7, 9, or 15 that alsocan bind the IL2 receptor alpha, beta and gamma chains (Theze et al.,Imm. Today 17:481-486 (1996)), and select for chimeras that retainbinding for the intermediate affinity IL2 receptor complex on monocytesbut have reduced affinity for the high affinity IL2 alpha, beta, gammareceptor complex on activated T cells.

[0346] B. Evolved Agonists with Increased Potency

[0347] In some embodiments of the invention, a preferred strategy is theselection or screening for mutants with increased agonist activity usingthe whole cell formats described above, combined with RSR. For example,a library of mutants of IL3 is expressed in active form on phage asdescribed by Gram et al. (J. Immun. Meth. 161:169-176 (1993)). Clonallysates resulting from infection with plaque-purified phage are preparedin a high through-put format such as a 96-well microtiter format. AnIL3-dependent cell line expressing a reporter gene such as GFP isstimulated with the phage lysates in a high throughput 96-well. Phagethat result in positive signals at the greatest dilution of phagesupernatants are recovered; alternatively, DNA encoding the mutant IL3can be recovered by PCR. In some embodiments, single cells expressingGFP under control of an IL3 responsive promoter can stimulated with theIL3 phage library, and the positive FACS sorted. The nucleic acid isthen subjected to PCR, and the process repeated until the desired levelof improvement is obtained.

TABLE I POLYPEPTIDE CANDIDATES FOR EVOLUTION

[0348] Name

[0349] Alpha-1 antitrypsin

[0350] Angiostatin

[0351] Antihemolytic factor

[0352] Apolipoprotein

[0353] Apoprotein

[0354] Atrial natriuretic factor

[0355] Atrial natriuretic polypeptide

[0356] Atrial peptides

[0357] C-X-C chemokines (e.g., T39765, NAP-2, ENA-78, Gro-a, Gro-b,Gro-c, IP-10, GCP-2, NAP-4, SDF-1, PF4, MIG)

[0358] Calcitonin

[0359] CC chemokines (e.g., Monocyte chemoattractant protein-1, Monocytechemoattractant protein-2, Monocyte chemoattractant protein-3, Monocyteinflammatory protein-1 alpha, Monocyte inflammatory protein-1 beta,RANTES, I309, R83915, R91733, HCC1, T58847, D31065, T64262)

[0360] CD40 ligand

[0361] Collagen

[0362] Colony stimulating factor (CSF)

[0363] Complement factor Sa

[0364] Complement inhibitor

[0365] Complement receptor 1

[0366] Factor IX

[0367] Factor VII

[0368] Factor VIII

[0369] Factor X

[0370] Fibrinogen

[0371] Fibronectin

[0372] Glucocerebrosidase

[0373] Gonadotropin

[0374] Hedgehog proteins (e.g., Sonic, Indian, Desert)

[0375] Hemoglobin (for blood substitute; for radiosensitization)

[0376] Hirudin

[0377] Human serum albumin

[0378] Lactoferrin

[0379] Luciferase

[0380] Neurturin

[0381] Neutrophil inhibitory factor (NIF)

[0382] Osteogenic protein

[0383] Parathyroid hormone

[0384] Protein A

[0385] Protein G

[0386] Relaxin

[0387] Renin

[0388] Salmon calcitonin

[0389] Salmon growth hormone

[0390] Soluble complement receptor I

[0391] Soluble I-CAM 1

[0392] Soluble interleukin receptors (IL-1, 2, 3, 4, 5, 6, 7, 9, 10, 11,12, 13, 14, 15)

[0393] Soluble TNF receptor

[0394] Somatomedin

[0395] Somatostatin

[0396] Somatotropin

[0397] Streptokinase

[0398] Superantigens, i.e., Staphylococcal enterotoxins (SEA, SEB, SEC1,SEC2, SEC3, SED, SEE), Toxic shock syndrome toxin (TSST-1), Exfoliatingtoxins A and B, Pyrogenic exotoxins A, B, and C, and M. arthritidismitogen

[0399] Superoxide dismutase

[0400] Thymosin alpha 1

[0401] Tissue plasminogen activator

[0402] Tumor necrosis factor beta (TNF beta)

[0403] Tumor necrosis factor receptor (TNFR)

[0404] Tumor necrosis factor-alpha (TNF alpha)

[0405] Urokinase

[0406] C. Evolution of Components of Eukaryotic Signal Transduction orTranscriptional Pathways

[0407] Using the screens and selections listed above, RSR can be used inseveral ways to modify eukaryotic signal transduction or transcriptionalpathways. Any component of a signal transduction pathway of interest, ofthe regulatory regions and transcriptional activators that interact withthis region and with chemicals that induce transcription can be evolved.This generates regulatory systems in which transcription is activatedmore potently by the natural inducer or by analogues of the normalinducer. This technology is preferred for the development andoptimization of diverse assays of biotechnological interest. Forexample, dozens of 7 transmembrane receptors (7-TM) are validatedtargets for drug discovery (see, for example, Siderovski et al., CurrBiol., 6(2):211-212 (1996); An et al., FEBS Lett., 375(1-2):121-124(1995); Raport et al., Gene, 163(2):295-299 (1995); Song et al.,Genomics, 28(2):347-349 (1995); Strader et al. FASEB J., 9(9) :745-754(1995); Benka et al., FEBS Lett., 363(1-2):49-52 (1995); Spiegel, J.Clin Endocrinol. Metab., 81(7):2434-2442 (1996); Post et al., FASEB J.,10(7):741-749 (1996); Reisine et al., Ann NY Acad. Sci., 780:168-175(1996); Spiegel, Annu. Ref. Physiol., 58:143-170 (1996); Barak et al.,Biochemistry, 34(47):15407-15414 (1995); and Shenker, Baillieres Clin.Endocrinol. Metab., 9(3):427-451 (1995)). The development of sensitivehigh throughput assays for agonists and antagonists of these receptorsis essential for exploiting the full potential of combinatorialchemistry in discovering such ligands. Additionally, biodetectors orbiosensors for different chemicals can be developed by evolving 7-TM'sto respond agonistically to novel chemicals or proteins of interest. Inthis case, selection would be for contructs that are activated by thenew chemical or polypeptide to be detected. Screening could be donesimply with fluorescence or light activated cell sorting, since thedesired improvement is coupled to light production.

[0408] In addition to detection of small molecules such aspharmaceutical drugs and environmental pollutants, biosensors can bedeveloped that will respond to any chemical for which there arereceptors, or for which receptors can be evolved by recursive sequencerecombination, such as hormones, growth factors, metals and drugs. Thereceptors may be intracellular and direct activators of transcription,or they may be membrane bound receptors that activate transcription ofthe signal indirectly, for example by a phosphorylation cascade. Theymay also not act on transcription at all, but may produce a signal bysome post-transcriptional modification of a component of the signalgenerating pathway. These receptors may also be generated by fusingdomains responsible for binding different ligands with differentsignalling domains. Again, recursive sequence recombination can be usedto increase the amplitude of the signal generated to optimize expressionand functioning of chimeric receptors, and to alter the specificity ofthe chemicals detected by the receptor.

[0409] For example, G proteins can be evolved to efficiently couplemammalian 7-TM receptors to yeast signal transduction pathways. Thereare 23 presently known G alpha protein loci in mammals which can begrouped by sequence and functional similarity into four groups, Gs (Gna,Gna1), Gi (Gnai-2, Gnai-3, Gnai-1, Gnao, Gnat-1, Gnat-2, Gnaz), Gq(Gnaq, Gna-11, Gna-14, Gna-15) and G12 (Gna-12, Gna-13) (B. Nurnberg etal., J. Mol. Med., 73:123-132 (1995)). They possess an endogenousGTP-ase activity allowing reversible functional coupling betweenligand-bound receptors and downstream effectors such as enzymes and ionchannels. G alpha proteins are complexed noncovalently with G beta and Ggamma proteins as well as to their cognate 7-TM receptor(s). Receptorand signal specificity are controlled by the particular combination of Galpha, G beta (of which there are five known loci) and G gamma (sevenknown loci) subunits. Activation of the heterotrimeric complex by ligandbound receptor results in dissociation of the complex into G alphamonomers and G beta, gamma dimers which then transmit signals byassociating with downstream effector proteins. The G alpha subunit isbelieved to be the subunit that contacts the 7-TM, and thus it is afocal point for the evolution of chimeric or evolved G alpha subunitsthat can transmit signals from mammalian 7-TM's to yeast downstreamgenes.

[0410] Yeast based bioassays for mammalian receptors will greatlyfacilitate the discovery of novel ligands. Kang et al. (Mol. Cell Biol.10:2582-2590 (1990)) have described the partial complementation of yeaststrains bearing mutations in SCG1 (GPA1), a homologue of the alphasubunits of G proteins involved in signal transduction in mammaliancells, by mammalian and hybrid yeast/mammalian G alpha proteins. Thesehybrids have partial function, such as complementing the growth defectin scg1 strains, but do not allow mating and hence do not fullycomplement function in the pheromone signal transduction pathway. Priceet al. (Mol. Cell Biol. 15:6188-6195 (1995)) have expressed ratsomatostatin receptor subtype 2 (SSTR2) in yeast and demonstratedtransmission of ligand binding signals by this 7-TM receptor throughyeast and chimeric mammalian/yeast G alpha subunits (“coupling”) to aHIS3 reporter gene, under control of the pheromone responsive promoterFUS-1 enabling otherwise HIS3(−) cells to grow on minimal medium lackinghistidine.

[0411] Such strains are useful as reporter strains for mammalianreceptors, but suffer from important limitations as exemplified by thestudy of Kang et al., where there appears to be a block in thetransmission of signals from the yeast pheromone receptors to themammalian G proteins. In general, to couple a mammalian 7-TM receptor toyeast signal transduction pathways one couples the mammalian receptor toyeast, mammalian, or chimeric G alpha proteins, and these will in turnproductively interact with downstream components in the pathway toinduce expression of a pheromone responsive promoter such as FUS-1. Suchfunctional reconstitution is commonly referred to as “coupling”.

[0412] The methods described herein can be used to evolve the couplingof mammalian 7-TM receptors to yeast signal transduction pathways. Atypical approach is as follows: (1) clone a 7-TM of interest into ayeast strain with a modified pheromone response pathway similar to thatdescribed by Price (e.g., strains deficient in FAR1, a negativeregulator of G₁ cyclins, and deficient in SST2 which causes the cells tobe hypersensitive to the presence of pheromone), (2) construct librariesof chimeras between the mammalian G alpha protein(s) known or thought tointeract with the GPA1 or homologous yeast G alpha proteins, (3) place aselectable reporter gene such as HIS3 under control of the pheromoneresponsive promoter FUS1 (Price et al., Mol. Cell Biol. 15:6188-6195(1995)). Alternatively, a screenable gene such as luciferase may beplaced under the control of the FUS1 promoter; (4) transform library (2)into strain (3) (HIS(−)), (5) screen or select for expression of thereporter in response to the ligand of interest, for example by growingthe library of transformants on minimal plates in the presence of ligandto demand HIS3 expression, (6) recover the selected cells, and and applyRSR to evolve improved expression of the reporter under the control ofthe pheromone responsive promoter FUS1.

[0413] A second important consideration in evolving strains withoptimized reporter constructs for signal transduction pathways ofinterest is optimizing the signal to noise ratio (the ratio of geneexpression under inducing vs noninducing conditions). Many 7-TM pathwaysare leaky such that the maximal induction of a typical reporter gene is5 to 10-fold over background. This range of signal to noise may beinsufficient to detect small effects in many high through put assays.Therefore, it is of interest to couple the 7-TM pathway to a secondnonlinear amplification system that is tuned to be below but near thethreshold of activation in the uninduced state. An example of anonlinear amplification system is expression of genes driven by thelambda PL promoter. Complex cooperative interactions between lambdarepressor bound at three adjacent sites in the cI promoter result invery efficient repression above a certain concentration of repressor.Below a critical threshold dramatic induction is seen and there is awindow within which a small decrease in repressor concentration leads toa large increase in gene expression (Ptashne, A Genetic Switch:PhageLambda and Higher Organisms, Blackwell Scientific Publ. Cambridge,Mass., 1992). Analogous effects are seen for some eukaryotic promoterssuch as those regulated by GAL4. Placing the expression of a limitingcomponent of a transcription factor for such a promoter (GAL4) under thecontrol of a GAL4 enhanced 7-TM responsive promoter results in smalllevels of induction of the 7-TM pathway signal being amplified to a muchlarger change in the expression of a reporter construct also under thecontrol of a GAL4 dependent promoter.

[0414] An example of such a coupled system is to place GAL4 undercontrol of the FUS-1 pheromone responsive promoter and to have theintracellular GAL4 (itself a transcriptional enhancer) level positivelyfeedback on itself by placing a GAL4 binding site upstream of the FUS-1promoter. A reporter gene is also put under the control of a GAL4activated promoter. This system is designed so that GAL4 expression willnonlinearly self-amplify and co-amplify expression of a reporter genesuch as luciferase upon reaching a certain threshold in the cell. RSRcan be used to great advantage to evolve reporter constructs with thedesired signaling properties, as follows: (1) A single plasmid constructis made which contains both the GAL4/pheromone pathway regulated GAL4gene and the GAL4 regulated reporter gene. (2) This construct ismutagenized and transformed into the appropriately engineered yeaststrain expressing a 7-TM and chimeric yeast/mammalian protein ofinterest. (3) Cells are stimulated with agonists and screened (orselected) based on the activity of the reporter gene. In a preferredformat, luciferase is the reporter gene and activity is quantitatedbefore and after stimulation with the agonist, thus allowing for aquantitative measurement of signal to noise for each colony. (4) Cellswith improved reporter properties are recovered, the constructs areshuffled, and RSR is applied to further evolve the plasmid to giveoptimal signal noise characteristics.

[0415] These approaches are general and illustrate how any component ofa signal transduction pathway or transcription factor could be evolvedusing RSR and the screens and selections described above. For example,these specific methods could be used to evolve 7-TM receptors withspecificity for novel ligands, specificity of nuclear receptors fornovel ligands (for example to obtain herbicide or other smallmolecule-inducible expression of genes of interest in transgenic plants,such that a given set of genes can be induced upon treatment with agiven chemical agent), specificity of transcription factors to beresponsive to viral factors (thus inducing antiviral or lethal genes incells expressing this transcription factor [transgenics or cells treatedwith gene therapy constructs]), or specificity of transcription factorsfor activity in cancer cells (for example p53 deficient cells, thusallowing one to infect with gene therapy constructs expressingconditionally lethal genes in a tumor specific fashion).

[0416] The following examples are offered by way of illustration, not byway of limitation.

EXPERIMENTAL EXAMPLES

[0417] I. Evolution of BIAP

[0418] A preferred strategy to evolve BIAP is as follows. A codon usagelibary is constructed from 60-mer oligonucleotides such that the central20 bases of each oligo specifies the wild type protein, but encodes thewild-type protein sequence with degenerate codons. Preferably, very rarecodons for the prokaryotic host of choice, such as E. coli, are notused. The 20 bases at each end of the oligo use non-degenerate, butpreferred, codons in E. coli. The oligonucleotides are assembled intofull-length genes as described above. The assembled products are clonedinto an expression vector by techniques well known in the art. In someembodiments, the codon usage library is expressed with a library ofsecretory leader sequences, each of which directs the encoded BIAPprotein to the E. coli periplasm. A library of leader sequences is usedto optimize the combination of leader sequence and mutant. Examples ofleader sequences are reviewed by Schatz et al. (Ann Rev. Genet.24:215-248 (1990)). The cloned BIAP genes are expressed under thecontrol of an inducible promoter such as the arabinose promoter.Arabinose-induced colonies are screened by spraying with a substrate forBIAP, bromo-chloro-indolyl phosphate (BCIP). The bluest colonies arepicked visually and subjected to the RSR procedures described herein.

[0419] The oligonucleotides for construction of the codon usage libraryare listed in Table II. The corresponding locations of these promotersis provided in FIG. 1. TABLE II 1. AACCCTCCAG TTCCGAACCC CATATGATGATCACCCTGCG TAAACTGCCG 2. AACCCTCCAG TTCCGAACCC CATATGAAAA AAACCGCT 3.AACCCTCCAG TTCCGAACCC ATATACATAT GCGTGCTAAA 4. AACCCTCCAG TTCCGAACCCCATATGAAAT ACCTGCTGCC GACC 5. AACCCTCCAG TTCCGAACCC GATATACATATGAAACAGTC 6. TGGTGTTATG TCTGCTCAGG CDATGGCDGT DGAYTTYCAY CTGGTTCCGGTTGAAGAGGA 7. GGCTGGTTTC GCTACCGTTG CDCARGCDGC DCCDAARGAY CTGGTTCCGGTTGAAGAGGA 8. CACCCCGATC GCTATCTCTT CYTTYGCDTC YACYGGYTCY CTGGTTCCGGTTGAAGAGGA 9. GCTGCTGGCT GCTCAGCCGG CDATGGCDAT GGAYATYGGY CTGGTTCCGGTTGAAGAGGA 10. TGCCGCTGCT GTTCACCCCG GTDACYAARG CDGCDCARGT DCTGGTTCCGGTTGAAGAGG A 11. CCCGGCTTTC TGGAACCGTC ARGCDGCDCA RGCDCTGGAC GTTGCTAAAAAACTGCAGCC 12. ACGTTATCCT GTTCCTGGGT GAYGGYATGG GYGTDCCDAC CGTTACCGCTACCCGTATCC 13. AAACTGGGTC CGGAAACCCC DCTGGCDATG GAYCARTTYC CGTACGTTGCTCTGTCTAAA 14. GGTTCCGGAC TCTGCTGGTA CYGCDACYGC DTAYCTGTGC GGTGTTAAAGGTAACTACCG 15. CTGCTCGTTA CAACCAGTGC AARACYACYC GYGGYAAYGA AGTTACCTCTGTTATGAACC 16. TCTGTTGGTG TTGTTACCAC YACYCGYGTD CARCAYGCDT CTCCGGCTGGTGCTTACGCT 17. GTACTCTGAC GCTGACCTGC CDGCDGAYGC DCARATGAAC GGTTGCCAGGACATCGCTGC 18. ACATCGACGT TATCCTGGGT GGYGGYCGYA ARTAYATGTT CCCGGTTGGTACCCCGGACC 19. TCTGTTAACG GTGTTCGTAA RCGYAARCAR AAYCTGGTDC AGGCTTGGCAGGCTAAACAC 20. GAACCGTACC GCTCTGCTGC ARGCDGCDGA YGAYTCYTCT GTTACCCACCTGATGGGTCT 21. AATACAACGT TCAGCAGGAC CAYACYAARG AYCCDACYCT GCAGGAAATGACCGAAGTTG 22. AACCCGCGTG GTTTCTACCT GTTYGTDGAR GGYGGYCGYA TCGACCACGGTCACCACGAC 23. GACCGAAGCT GGTATGTTCG AYAAYGCDAT YGCDAARGCT AACGAACTGACCTCTGAACT 24. CCGCTGACCA CTCTCACGTT TTYTCYTTYG GYGGYTAYAC CCTGCGTGGTACCTCTATCT 25. GCTCTGGACT CTAAATCTTA YACYTCYATY CTGTAYGGYA ACGGTCCGGGTTACGCTCTG 26. CGTTAACGAC TCTACCTCTG ARGAYCCDTC YTAYCARCAG CAGGCTGCTGTTCCGCAGGC 27. AAGACGTTGC TGTTTTCGCT CGYGGYCCDC ARGCDCAYCT GGTTCACGGTGTTGAAGAAG 28. ATGGCTTTCG CTGGTTGCGT DGARCCDTAY ACYGAYTGYA ACCTGCCGGCTCCGACCACC 29. TGCTCACCTG GCTGCTTMAC CDCCDCCDCT GGCDCTGCTG GCTGGTGCTATGCTGCTCCT C 30. TTCCGCCTCT AGAGAATTCT TARTACAGRG THGGHGCCAG GAGGAGCAGCATAGCACCAG CC 31. AAGCAGCCAG GTGAGCAGCG TCHGGRATRG ARGTHGCGGT GGTCGGAGCCGGCAGGTT 32. CGCAACCAGC GAAAGCCATG ATRTGHGCHA CRAARGTYTC TTCTTCAACACCGTGAACCA 33. GCGAAAACAG CAACGTCTTC RCCRCCRTGR GTYTCRGAHG CCTGCGGAACAGCAGCCTGC 34. AGAGGTAGAG TCGTTAACGT CHGGRCGRGA RCCRCCRCCC AGAGCGTAACCCGGACCGTT 35. AAGATTTAGA GTCCAGAGCT TTRGAHGGHG CCAGRCCRAA GATAGAGGTACCACGCAGGG 36. ACGTGAGAGT GGTCAGCGGT HACCAGRATC AGRGTRTCCA GTTCAGAGGTCAGTTCGTTA 37. GAACATACCA GCTTCGGTCA GHGCCATRTA HGCYTTRTCG TCGTGGTGACCGTGGTCGAT 38. GGTAGAAACC ACGCGGGTTA CGRGAHACHA CRCGCAGHGC AACTTCGGTCATTTCCTGCA 39. TCCTGCTGAA CGTTGTATTT CATRTCHGCH GGYTCRAACA GACCCATCAGGTGGGTAACA 40. CAGCAGAGCG GTACGGTTCC AHACRTAYTG HGCRCCYTGG TGTTTAGCCTGCCAAGCCTG 41. TACGAACACC GTTAACAGAA GCRTCRTCHG GRTAYTCHGG GTCCGGGGTACCAACCGGGA 42. CCCAGGATAA CGTCGATGTC CATRTTRTTH ACCAGYTGHG CAGCGATGTCCTGGCAACCG 43. CAGGTCAGCG TCAGAGTACC ARTTRCGRTT HACRGTRTGA GCGTAAGCACCAGCCGGAGA 44. TGGTAACAAC ACCAACAGAT TTRCCHGCYT TYTTHGCRCG GTTCATAACAGAGGTAACTT 45. CACTGGTTGT AACGAGCAGC HGCRGAHACR CCRATRGTRC GGTAGTTACCTTTAACACCG 46. ACCAGCAGAG TCCGGAACCT GRCGRTCHAC RTTRTARGTT TTAGACAGAGCAACGTACGG 47. GGGTTTCCGG ACCCAGTTTA CCRTTCATYT GRCCYTTCAG GATACGGGTAGCGGTAACGG 48. CCCAGGAACA GGATAACGTT YTTHGCHGCR GTYTGRATHG GCTGCAGTTTTTTAGCAACG 49. ACGGTTCCAG AAAGCCGGGT CTTCCTCTTC AACCGGAACC AG 50.CCTGAGCAGA CATAACACCA GCHGCHACHG CHACHGCCAG CGGCAGTTTA CGCAGGGTGA 51.ACCGGGGTGA ACAGCAGCGG CAGCAGHGCC AGHGCRATRG TRGACTGTTT CATATGTATA TC 52.GCCGGCTGAG CAGCCAGCAG CAGCAGRCCH GCHGCHGCGG TCGGCAGCAG GTAGTTTCA 53.AAGAGATAGC GATCGGGGTG GTCAGHACRA TRCCCAGCAG TTTAGCACGC ATATGTATAT 54.CAACGGTAGC GAAACCAGCC AGHGCHACHG CRATHGCRAT AGCGGTTTTT TTCATATG 55 AGAATTCTCT AGAGGCGGAA ACTCTCCAAC TCCCAGGTT 56. TGAGAGGTTG AGGGTCCAATTGGGAGGTCA AGGCTTGGG

[0420] II. Mammalian Surface Display

[0421] During an immune response antibodies naturally undergo a processof affinity maturation resulting in mutant antibodies with improvedaffinities for their cognate antigens. This process is driven by somatichypermutation of antibody genes coupled with clonal selection (Berek andMilstein, Immun. Rev. 96:23-41 (1987)). Patten et al. (Science271:1086-1091 (1996)) have reconstructed the progression of a catalyticantibody from the germline sequence, which binds ap-nitrophenylphosphonate hapten with an affinity of 135 micromolar, tothe affinity matured sequence which has acquired nine somatic mutationsand binds with an affinity of 10 nanomolar. The affinity maturation ofthis antibody can be recapitulated and improved upon using cassettemutagenesis of the CDR's (or random mutagenesis such as with PCR),mammalian display, FACS selection for improved binding, and RSR torapidly evolve improved affinity by recombining mutations encodingimproved binding.

[0422] Genomic antibody expression shuttle vectors similar to thosedescribed by Gascoigne et al. (Proc. Natl. Acad. Sci. (U.S.A.)84:2936-2940 (1987)) are constructed such that libraries of mutant Vregion exons can be readily cloned into the shuttle vectors. The kappaconstruct is cloned onto a plasmid encoding puromycin resistance and theheavy chain is cloned onto a neomycin resistance encoding vector. ThecDNA derived variable region sequences encoding the mature and germlineheavy and light chain V regions are reconfigured by PCR mutagenesis intogenomic exons flanked by Sfi I sites with complementary Sfi I sitesplaced at the appropriate locations in the genomic shuttle vectors. Theoligonucleotides used to create the intronic Sfi I sites flanking theVDJ exon are: 5′ Sfi I: 5′-TTCCATTTCA TACATGGCCG AAGGGGCCGT GCCATGAGGATTTT-3′; 3′ Sfi I: 5′-TTCTAAATG CATGTTGGCC TCCTTGGCCG GATTCTGAGCCTTCAGGACC A-3′. Standard PCR mutagenesis protocols are applied toproduce libraries of mutants wherein the following sets of residues(numbered according to Kabat, Sequences of Proteins of ImmunologicalInterest, U.S. Dept of Health and Human Services, 1991) are randomizedto NNK codons (GATC,GATC,GC): Chain CDR Mutated residues V-L 1 30, 31,34 V-L 2 52, 53, 55 V-H 2 55, 56, 65 V-H “4” 74, 76, 78

[0423] Stable transfectant lines are made for each of the two light andheavy chain constructs (mature and germline) using the B cell myelomaAG8-653 (a gift from J. Kearney) as a host using standardelectroporation protocols. Libraries of mutant plasmids encoding theindicated libraries of V-L mutants are transfected into the stabletransformant expressing the germline V-H; and the V-H mutants aretransfected into the germline V-L stable transfectant line. In bothcases, the libraries are introduced by protoplast fusion (Sambrook etal., Molecular Cloning, CSH Press (1987)) to ensure that the majority oftransfected cells receive one and only one mutant plasmid sequence(which would not be the case for electroporation where the majority ofthe transfected cells would receive many plasmids, each expressing adifferent mutant sequence).

[0424] The p-nitrophenylphosphonate hapten (JWJ-1) recognized by thisantibody is synthesized as described by Patten et al. (Science271:1086-1091 (1996)). JWJ-1 is coupled directly to5-(((2-aminoethyl)thio)acetyl) fluorescein (Molecular Probes, Inc.) byformation of an amide bond using a standard coupling chemistry such asEDAC (March, Advanced Organic Chemistry, Third edition, John Wiley andSons, 1985) to give a monomeric JWJ-1-FITC probe. A “dimeric” conjugate(two molecules of JWJ-1 coupled to a FACS marker) is made in order toget a higher avidity probe, thus making low affinity interactions (suchas with the germline antibody) more readily detected by FACS. This isgenerated by staining with Texas Red conjugated to an anti-fluoresceinantibody in the presence of two equivalents of JWJ-1-FITC. The bivalentstructure of IgG then provides a homogeneous bivalent reagent. A spincolumn is used to remove excess JWJ-1-FITC molecules that are not boundto the anti-FITC reagent. A tetravalent reagent is made as follows. Oneequivalent of biotin is coupled with EDAC to two equivalents ofethylenediamine, and this is then be coupled to the free carboxylate onJWJ-1. The biotiylated JWJ-1 product is purified by ion exchangechromatography and characterized by mass spectrometry. FITC labelledavidin is incubated with the biotinylated JWJ-1 in order to generate atetravalent probe.

[0425] The FACS selection is performed as follows, according to aprotocol similar to that of Panka et al. (Proc. Natl. Acad. Sci.(U.S.A.) 85:3080-3084 (1988)). After transfection of libraries of mutantantibody genes by the method of protoplast fusion (with recovery for36-72 hours), the cells are incubated on ice with fluorescently labelledhapten. The incubation is done on ice to minimize pinocytosis of theFITC conjugate which may contribute to nonspecific background. The cellsare then sorted on the FACS either with or without a washing step.FACSing without a washing step is preferable because the off rate forthe germline antibody prior to affinity maturation is expected to bevery fast (>0.1 sec-1; Patten et al., Science 271:1086-1091 (1996)); awashing step adds a complicating variable. The brightest 0.1-10% of thecells are collected.

[0426] Four parameters are manipulated to optimize the selection forincreased binding: monomeric vs dimeric vs tetrameric hapten,concentration of hapten used in the staining reaction (low concentrationselects for high affinity Kd's), time between washing and FACS (longertime selects for low off rates), and selectivity in the gating (i.e.take the top 0.1% to 10%, more preferably the top 0.1%). The constructsexpressing the germline, mature, and both combinations of half germlineare used as controls to optimize this selectivity.

[0427] Plasmids are recovered from the FACS selected cells by thetransformation of an E. coli host with Hirt supernatants. Alternatively,the mutant V gene exons are PCR-amplified from the FACS selected cells.The recovered V gene exons are subjected to RSR, recloned into thecorresponding genomic shuttle vector, and the procedure recursivelyapplied until the mean fluorescence intensity has increased. A relevantpositive control for improved binding is transfection with the affinitymatured 48G7 exons (Patten et al., op. cit.).

[0428] In a further experiment, equal numbers of germline and each ofthe two half germline transfectants are mixed. The brightest cells areselected under conditions described above. The V genes are recovered byPCR, recloned into expression vectors, and co-transfected, either twoplasmids per E. coli followed by protoplast fusion, or by bulkelectroporation. The mean fluorescent intensity of the transfectantsshould increase due to enrichment of mature relative to germline Vregions.

[0429] This methodology can be applied to evolve any receptor-ligand orbinding partner interaction. Natural expression formats can be used toexpress libraries of mutants of any receptor for which one wants toimprove the affinity for the natural or novel ligands. Typical exampleswould be improvement of the affinity of T cell receptors for ligands ofinterest (i.e. MHC/tumor peptide antigen complexes) or TNF receptor forTNF (soluble forms of TNF receptors are used therapeutically toneutralize TNF activity).

[0430] This format can also be used to select for mutant forms ofligands by expressing the ligand in a membrane bound form with anengineered membrane anchor by a strategy analogous to that of Wettsteinet al.(J. Exp. Med. 174:219-28 (1991)). FACS selection is then performedwith fluorescently labelled receptor. In this format one could, forexample, evolve improved receptor antagonists from naturally occurringreceptor antagonists (IL1 receptor antagonist, for example). Mutantforms of agonists with improved affinity for their cognate receptorscould also be evolved in this format. These mutants would be candidatesfor improved agonists or potent receptor antagonists, analogous toreported antagonistic mutant forms of IL3.

[0431] III. Evolution of Alpha Interferon

[0432] There are at hand 18 known non-allelic human interferon-alpha(INF-α) genes, with highly related primary structures (78-95% identical)and with a broad range of biological activities. Many hybrid interferonswith interesting biological activities differing from the parentalmolecules have been described (reviewed by Horisberger and Di Marco,Pharm. Ther. 66:507-534 (1995)). A consensus human alpha interferon,IFN-Con1, has been constructed synthetically wherein the most commonresidue in fourteen known IFN-α's has been put at each position, and itcompares favorably with the naturally occurring interferons (Ozes etal., J. Interferon Res. 12:55-59 (1992)). This IFN contains.20 aminoacid changes relative to IFN-α2a, the INF-α to which it is most closelyrelated. IFN-Con1 has 10-fold higher specific antiviral activity thanany known natural IFN subtype. IFN-α Con1 has in vitro activities 10 to20 fold higher than that of recombinant IFN α-2a (the major IFN usedclinically) in antiviral, antiproliferative and NK cell activation.Thus, there is considerable interest in producing interferon hybridswhich combine the most desirable traits from two or more interferons.However, given the enormous number of potential hybrids and the lack ofa crystal structure of IFN-α or of the IFN-α receptor, there is aperceived impasse in the development of novel hybrids (Horisberger andDi Marco, Pharm. Ther. 66:507-534 (1995)).

[0433] The biological effects of IFN-α's are diverse, and include suchproperties as induction of antiviral state (induction of factors thatarrest translation and degrade mRNA); inhibition of cell growth;induction of Class I and Class II MHC; activation of monocytes andmacrophages; activation of natural killer cells; activation of cytotoxicT cells; modulation of Ig synthesis in B cells; and pyrogenic activity.

[0434] The various IFN-α's subtypes have unique spectra of activities ondifferent target cells and unique side effect profiles (Ortaldo et al.,Proc. Natl. Acad. Sci. (U.S.A.) 81:4926-4929 (1984); Overall et al., J.Interferon Res. 12:281-288 (1992); Fish and Stebbing, Biochem. Biophys.Res. Comm. 112:537-546 (1983); Weck et al., J. Gen. Virol. 57:233-237(1981)). For example, human IFNα has very mild side effects but lowantiviral activity. Human IFNα8 has very high antiviral activity, butrelatively severe side effects. Human IFNα7 lacks NK activity and blocksNK stimulation by other INFα's. Human IFN-α J lacks the ability tostimulate NK cells, but it can bind to the IFN-α receptor on NK cellsand block the stimulatory activity of IFN-αA (Langer et al., J.Interferon Res. 6:97-105 (1986)).

[0435] The therapeutic applications of interferons are limited bydiverse and severe side effect profiles which include flu-like symptoms,fatigue, neurological disorders including hallucination, fever, hepaticenzyme elevation, and leukopenia. The multiplicity of effects of IFN-α'shas stimulated the hypothesis that there may be more than one receptoror a multicomponent receptor for the IFN-α family (R. Hu et al., J.Biol. Chem. 268:12591-12595 (1993)). Thus, the existence of abundantnaturally occurring diversity within the human alpha IFN's (and hence alarge sequence space of recombinants) along with the complexity of theIFN-α receptors and activities creates an opportunity for theconstruction of superior hybrids.

[0436] A. Complexity of the Sequence Space

[0437]FIG. 2 shows the protein sequences of 11 human IFN-α's. Thedifferences from consensus are indicated. Those positions where adegenerate codon can capture all of the diversity are indicated with anasterisk. Examination of the aligned sequences reveals that there are 57positions with two, 15 positions with three, and 4 positions with fourpossible amino acids encoded in this group of alpha interferon genes.Thus, the potential diversity encoded by permutation of all of thisnaturally occurring diversity is: 2⁵⁷×3¹⁵×4⁴=5.3×10²⁶. Among thesehybrids, of the 76 polymorphisms spread over a total of 175 sites in the11 interferon genes, 171 of the 175 changes can be incorporated intohomologue libraries using single degenerate codons at the correspondingpositions. For example, Arg, Trp and Gly can all be encoded by thedegenerate codon [A,T,G]GG. Using such a strategy, 1.3×10²⁵ hybrids canbe captured with a single set of degenerate oligonucleotides. As isevident from Tables III to VI, 27 oligonucleotides is sufficient toshuffle all eleven human alpha interferons. Virtually all of the naturaldiversity is thereby encoded and fully permuted due to degeneracies inthe nine “block” oligonucleotides in Table V.

[0438] B. Properties of a “Coarse Grain” Search of Homologue SequenceSpace

[0439] The modelled structure of IFN alpha (Kontsek, Acta Vir.38:345-360 (1994)) has been divided into nine segments based on acombination of criteria of maintaining secondary structure elements assingle units and placing/choosing placement of the segment boundaries inregions of high identity. Hence, one can capture the whole family with asingle set of mildly degenerate oligonucleotides. Table III and FIG. 2give the precise locations of these boundaries at the protein and DNAlevels respectively. It should be emphasized that this particularsegmentation scheme is arbitrary and that other segmentation schemescould also be pursued. The general strategy does not depend on placementof recombination boundaries at regions of high identity between thefamily members or on any particular algorithm for breaking the structureinto segments. TABLE III Segmentation Scheme for Alpha Interferon Amino# Permutations of all Segment Acids # Alleles Sequence Variations 1 1-21 5 1024 2 22-51 10  6.2 × 10⁴ 3 52-67 6  96 4 68-80 7 1024 5 81-927  192 6  93-115 10  2.5 × 10⁵ 7 116-131 4   8 8 132-138 4   8 9 139-1679 9216

[0440] Many of the IFN's are identical over some of the segments, andthus there are less than eleven different “alleles” of each segment.Thus, a library consisting of the permutations of the segment “alleles”would have a potential complexity of 2.1×10⁷ (5 segment #1's times 10segment #2's× . . . ×9 segment #9's). This is far more than can beexamined in most of the screening procedures described, and thus this isa good problem for using RSR to search the sequence space.

[0441] C. Detailed Strategies for Using RSR to Search the IFN-alphaHomologue Sequence Space

[0442] The methods described herein for oligo directed shuffling (i.e.bridge oligonucleotides) are employed to construct libraries ofinterferon alpha hybrids, and the general methods described above areemployed to screen or select these mutants for improved function. Asthere are numerous formats in which to screen or select for improvedinterferon activity, many of which depend on the unique properties ofinterferons, exemplary descriptions of IFN based assays are describedbelow.

[0443] D. A Protocol for a Coarse Grain Search of Hybrid IFN AlphaSequence Space

[0444] In brief, libraries are constructed wherein the 11 homologousforms of the nine segments are permuted (note that in many cases twohomologues are identical over a given segment). All nine segments arePCR- amplified out of all eleven IFN alpha genes with the eighteenoligonucleotides listed in Table IV, and reassembled into full lengthgenes with oligo directed recombination. An arbitrary number, e.g.,1000, clones from the library are prepared in a 96-wellexpression/purification format. Hybrids with the most potent antiviralactivities are screened. Nucleic acid is recovered by PCR amplification,and subjected to recombination using bridge oligonucleotides. Thesesteps are repeated until candidates with desired properties areobtained.

[0445] E. Strategies for Examining the Space of >10²⁶ Fine Grain Hybrids

[0446] In brief, each of the nine segments is synthesized with onedegenerate oligo per segment. Degeneracies are chosen to capture all ofthe IFN-alpha diversity that can be captured with a single degeneratecodon without adding any non-natural sequence. A second set ofdegenerate oligonucleotides encoding the nine segments is generatedwherein all of the natural diversity is captured, but additionalnon-natural mutations are included at positions where necessitated bythe constraints of the genetic code. In most cases all of the diversitycan be captured with a single degenerate codon; in some cases adegenerate codon will capture all of the natural diversity but will addone non-natural mutation; at a few postions it is not possible tocapture the natural diversity without putting in a highly degeneratecodon which will create more than one non-natural mutation. It is atthese positions that this second set of oligonucleotides will differfrom the first set by being more inclusive. Each of the nine syntheticsegments is then amplified by PCR with the 18 PCR oligonucleotides. Fulllength genes using the oligo directed recombination method aregenerated, transfected into a host, and assayed for hybrids with desiredproperties. The best hybrids from (e.g, the top 10%, 1% or 0.1%;preferably the top 1%) are subjected to RSR and the process repeateduntil a candidate with the desired properties is obtained.

[0447] F. “Non-gentle” Fine Grain Search

[0448] On the one hand, one could make libraries wherein each segment isderived from the degenerate synthetic oligonucleotides which will encoderandom permutations of the homologue diversity. In this case, theinitial library will very sparsely search the space of >10²⁵ possiblefine grain hybrids that are possible with this family of genes. Onecould proceed by breeding positives together from this search. However,there would be a large number of differences between independent membersof such libraries, and consequently the breeding process would not bevery “gentle” because pools of relatively divergent genes would berecombined at each step.

[0449] G. “Gentle” Fine Grain Search

[0450] One way to make this approach more “gentle” would be to obtain acandidate starting point and to gently search from there. This startingpoint could be either one of the natural IFN-alpha's (such as IFNalpha-2a which is the one that is being used most widelytherapeutically), the characterized IFN-Con1 consensus interferon, or ahit from screening the shuffled IFN-alpha's described above. Given astarting point, one would make separate libraries wherein one breeds thedegenerate segment libraries one at a time into the founder sequence.Improved hits from each library would then be bred together to gentlybuild up mutations all throughout the molecule.

[0451] H. Functional Cellular Assays

[0452] The following assays, well known in the art, are used to screenIFN alpha mutants: inhibition of viral killing; standard error of30-50%; inhibition of plaque forming units; very low standard error (canmeasure small effects); reduced viral yield (useful for nonlethal,nonplaque forming viruses); inhibition of cell growth (3H-thymidineuptake assay; activation of NK cells to kill tumor cells; suppression oftumor formation by human INF administered to nude mice engrafted withhuman tumors (skin tumors for example).

[0453] Most of these assays are amenable to high throughput screening.Libraries of recombinant IFN alpha mutants are expressed and purified inhigh throughput formats such as expression, lysis and purification in a96-well format using anti-IFN antibodies or an epitope tag and affinityresin. The purified IFN preparations are screened in a high throughputformat, scored, and the mutants encoding the highest activities ofinterest are subjected to further mutagenesis, such as RSR, and theprocess repeated until a desired level of activity is obtained.

[0454] I. Phage Display

[0455] Standard phage display formats are used to display biologicallyactive IFN. Libraries of chimeric IFN genes are expressed in this formatand are selected (positively or negatively) for binding (or reducedbinding) to one or more purified IFN receptor preparations or to one ormore IFN receptor expressing cell types.

[0456] J. GFP or Luciferase Under Control of IFN-Alpha DependentPromoter

[0457] Protein expressed by mutants can be screened in high throughputformat on a reporter cell line which expresses GFP or luciferase underthe control of an IFN alpha responsive promoter, such as an MHC Class Ipromoter driving GFP expression.

[0458] K. Stimulation of Target Cells with Intact Infections Particles

[0459] Purification of active IFN will limit the throughput of theassays described above. Expression of active IFN alpha on filamentousphage M13 would allow one to obtain homogenous preparations of IFNmutants in a format where thousands or tens of thousands of mutantscould readily be handled. Gram et al. (J. Imm. Meth. 161:169-176 (1993))have demonstrated that human IL3, a cytokine with a protein fold similarin topology to IFN alpha, can be expressed on the surface of M13 andthat the resultant phage can present active IL3 to IL3 dependent celllines. Similarly, Saggio et al. (Gene 152:35-39 (1995)) have shown thathuman ciliary neurotrophic factor, a four helix bundle cytokine, isbiologically active when expressed on phage at concentrations similar tothose of the soluble cytokine. Analogously, libraries of IFN alphamutants on M13 can be expressed and lysates of defined titre used topresent biologically active IFN in the high throughput assays andselections described herein.

[0460] The following calculation supports the feasibility of applyingthis technology to IFN alpha. Assuming (1) titres of 1×10¹⁰ phage/mlwith five active copies of interferon displayed per phage, and (2) thatthe displayed interferon is equivalently active to soluble recombinantinterferon (it may well be more potent due to multi-valency), thequestion then is whether one can reasonably expect to see biologicalactivity.

(1×10¹⁰ phage/ml)×(5 IFN molecules/phage)×(1 mole/6×10²³molecules)×(26,000 gm/mole)×(10⁹ ng/gm)=2.2 ng/ml

[0461] The range of concentration used in biological assays is: 1 ng/mlfor NK activation, 0.1-10 ng/ml for antiproliferative activity on Eskolcells, and 0.1-1 ng/ml on Daudi cells (Ozes et al., J. Interferon Res.12:55-59 (1992)). Although some subtypes are glycosylated, interferonalpha2a and consensus interferon are expressed in active recombinantform in E. coli, so at least these two do not require glycosylation foractivity. Thus, IFN alpha expressed on filamentous phage is likely to bebiologically active as phage lysates without further purification.Libraries of IFN chimeras are expressed in phage display formats andscored in the assays described above and below to identify mutants withimproved properties to be put into further rounds of RSR.

[0462] When one phage is sufficient to activate one cell due to the highvalency state of the displayed protein (five per phage in the gene IIIformat; hundreds per phage in the gene VIII format; tens in the lambdagene V format), then a phage lysate can be used directly at suitabledilution to stimulate cells with a GFP reporter construct under thecontrol of an IFN responsive promoter. Assuming that the phage remainattached after stimulation, expression and FACS purification of theresponsive cells, one could then directly FACS purify hybrids withimproved activity from very large libraries (up to and perhaps largerthan 10⁷ phage per FACS run).

[0463] A second way in which FACS is used to advantage in this format isthe following. Cells can be stimulated in a multiwell format with onelysate per well and a GFP type reporter construct. All stimulated cellsare FACS purified to collect the brightest cells, and the IFN genesrecovered and subjected to RSR, with iteration of the protocol until thedesired level of improvement is obtained. In this protocol thestimulation is performed with individual concentrated lysates and hencethe requirement that a single phage be sufficient to stimulate the cellis relaxed. Furthermore, one can gate to collect the brightest cellswhich, in turn, should have the most potent phage attached to them.

[0464] L. Cell Surface Display Protocol for IFN Alpha Mutants

[0465] A sample protocol follows for the cell surface display of IFNalpha mutants. This form of display has at least two advantages overphage display. First, the protein is displayed by a eukaryotic cell andhence can be expressed in a properly glycosylated form which may benecessary for some IFN alphas (and other growth factors). Secondly, itis a very high valency display format and is preferred in detectingactivity from very weakly active mutants.

[0466] In brief, a library of mutant IFN's is constructed wherein apolypeptide signal for addition of a phosphoinositol tail has been fusedto the carboxyl terminus, thus targeting the protein for surfaceexpression (Wettstein et al., J. Exp. Med. 174:219-28 (1991)). Thelibrary is used to transfect reporter cells described above (luciferasereporter gene) in a microtiter format. Positives are detected with acharge coupling device(CCD) camera. Nucleic acids are recovered eitherby HIRT and retransformation of the host or by PCR, and are subjected toRSR for further evolution.

[0467] M. Autocrine Display Protocol for Viral Resistance

[0468] A sample protocol follows for the autocrine display of IFN alphamutants. In brief, a library of IFN mutants is generated in a vectorwhich allows for induction of expression (i.e. metallothionein promoter)and efficient secretion. The recipient cell line carrying an IFNresponsive reporter cassette [GFP or luciferase] is induced bytransfection with the mutant IFN constructs. Mutants which stimulate theIFN responsive promoter are detected by by FACS or CCD camera.

[0469] A variation on this format is to challenge transfectants withvirus and select for survivors. One could do multiple rounds of viralchallenge and outgrowth on each set of transfectants prior to retrievingthe genes. Multiple rounds of killing and outgrowth allow an exponentialamplification of a small advantage and hence provide an advantage indetecting small improvements in viral killing. TABLE IV Oligonucleotidesneeded for blockwise recombination: 18 Oligonucleotides for alphainterferon shuffling 1. 5′-TGT[G/A]ATCTG[C/T]CT[C/G]AGACC 2.5′-GGCACAAATG[G/A/C]G[A/C]AGAATCTCTC 3.5′-AGAGATTCT[G/T]C[C/T/G]CATTTGTGCC 4.5′-CAGTTCCAGAAG[A/G]CT[G/C][C/A]AGCCATC 5.5′-GATGGCT[T/G][G/C]AG[T/C]CTTCTGGAACTG 6. 5′-CTTCAATCTCTTCA[G/C]CACA 7.5′-TGTG[G/C]TGAAGAGATTGAAG 8. 5′-GGA[T/A][G/C]AGA[C/G][C/G]CTCCTAGA 9.5′-TCTAGGAG[G/C][G/C]TCT[G/C][T/A]TCC 10.5′-GAACTT[T/G/A][T/A]CCAGCAA[A/C]TGAAT 11.5′-ATTCA[T/G]TTGCTGG[A/T][A/T/C]AAGTTC 12. 5′-GGACT[T/C]CATCCTGGCTGTG13. 5′-CACAGCCAGGATG[G/A]AGTCC 14. 5′-AAGAATCACTCTTTATCT 15.5′-AGATAAAGAGTGATTCTT 16. 5′-TGGGAGGTTGTCAGAGCAG 17.5′-CTGCTCTGACAACCTCCCA 18. 5′-TCA[A/T]TCCTT[C/A]CTC[T/C]TTAA

[0470] Brackets indicate degeneracy with equal mixture of the specifiedbases at those positions. The purpose of the degeneracy is to allow thisone set of primers to prime all members of the IFN family with similarefficiency. The choice of the oligo driven recombination points isimportant because they will get “overwritten” in each cycle of breedingand hence cannot coevolve with the rest of the sequence over many cyclesof selection. TABLE V Oligonucleotides needed for “fine grain”recombination of natural diversity over each of the nine blocks #Lengthof Block oligo required 1 76 2 95 3 65 4 56 5 51 6 93 7 50 8 62 9 80

[0471] TABLE VI Amino acids that can be reached by a single stepmutation in the codon of interest. Wild-Type Amino Amino acids reachableAcid by one mutation W C, R, G, L Y F, S. C, H, N, D F L, I, V, S, Y, CL S, W, F, I, M, V, P V F, L, I, M, A, D, E, G I F, L, M, V, T, N, K, S,R A S, P, T, V, D, E, G G V, A, D, E, R, S, C, W M L, I, V, T, K, R S F,L, Y, C, W, P, T, A, R, G, N, T, I T S, P, A, I, M, N, K, S, R P S, T,A, L, H, Q, R C F, S, Y, R, G, W N Y, H, K, D, S, T, I Q Y, H, K, E, L,P, R H Y, Q, N, D, L, P, R D Y, H, N, E, V, A, G E Q, K, D, V, A, G R L,P, H, Q, C, W, S, G, K, T, I, M K Q, N, E, R, T, I, M

[0472] Based on this Table, the polymorphic positions in IFN alpha whereall of the diversity can be captured by a degenerate codon have beenidentified. Oligonucleotides of the length indicated in Table V abovewith the degeneracies inferred from Table VI are synthesized.

[0473] Although the foregoing invention has been described in somedetail by way of illustration and example for purposes of clarity ofunderstanding, it will be obvious that certain changes and modificationsmay be practiced within the scope of the appended claims.

[0474] All references cited herein are expressly incorporated in theirentirety for all purposes.

1 101 1 50 DNA Artificial Sequence Description of Artificial Sequencedegenerate oligonucleotide used for codon usage library 1 aaccctccagttccgaaccc catatgatga tcaccctgcg taaactgccg 50 2 38 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 2 aaccctccag ttccgaaccc catatgaaaa aaaccgct38 3 40 DNA Artificial Sequence Description of Artificial Sequencedegenerate oligonucleotide used for codon usage library 3 aaccctccagttccgaaccc atatacatat gcgtgctaaa 40 4 44 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 4 aaccctccag ttccgaaccc catatgaaat acctgctgcc gacc44 5 40 DNA Artificial Sequence Description of Artificial Sequencedegenerate oligonucleotide used for codon usage library 5 aaccctccagttccgaaccc gatatacata tgaaacagtc 40 6 60 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 6 tggtgttatg tctgctcagg cdatggcdgt dgayttycayctggttccgg ttgaagagga 60 7 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 7 ggctggtttc gctaccgttg cdcargcdgc dccdaargay ctggttccggttgaagagga 60 8 60 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 8caccccgatc gctatctctt cyttygcdtc yacyggytcy ctggttccgg ttgaagagga 60 960 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 9 gctgctggct gctcagccggcdatggcdat ggayatyggy ctggttccgg ttgaagagga 60 10 61 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 10 tgccgctgct gttcaccccg gtdacyaargcdgcdcargt dctggttccg gttgaagagg 60 a 61 11 60 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 11 cccggctttc tggaaccgtc argcdgcdca rgcdctggacgttgctaaaa aactgcagcc 60 12 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 12 acgttatcct gttcctgggt gayggyatgg gygtdccdac cgttaccgctacccgtatcc 60 13 60 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 13aaactgggtc cggaaacccc dctggcdatg gaycarttyc cgtacgttgc tctgtctaaa 60 1460 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 14 ggttccggac tctgctggtacygcdacygc dtayctgtgc ggtgttaaag gtaactaccg 60 15 60 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 15 ctgctcgtta caaccagtgc aaracyacycgyggyaayga agttacctct gttatgaacc 60 16 60 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 16 tctgttggtg ttgttaccac yacycgygtd carcaygcdtctccggctgg tgcttacgct 60 17 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 17 gtactctgac gctgacctgc cdgcdgaygc dcaratgaac ggttgccaggacatcgctgc 60 18 60 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 18acatcgacgt tatcctgggt ggyggycgya artayatgtt cccggttggt accccggacc 60 1960 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 19 tctgttaacg gtgttcgtaarcgyaarcar aayctggtdc aggcttggca ggctaaacac 60 20 60 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 20 gaaccgtacc gctctgctgc argcdgcdgaygaytcytct gttacccacc tgatgggtct 60 21 60 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 21 aatacaacgt tcagcaggac cayacyaarg ayccdacyctgcaggaaatg accgaagttg 60 22 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 22 aacccgcgtg gtttctacct gttygtdgar ggyggycgya tcgaccacggtcaccacgac 60 23 60 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 23gaccgaagct ggtatgttcg ayaaygcdat ygcdaargct aacgaactga cctctgaact 60 2460 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 24 ccgctgacca ctctcacgttttytcyttyg gyggytayac cctgcgtggt acctctatct 60 25 60 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 25 gctctggact ctaaatctta yacytcyatyctgtayggya acggtccggg ttacgctctg 60 26 60 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 26 cgttaacgac tctacctctg argayccdtc ytaycarcagcaggctgctg ttccgcaggc 60 27 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 27 aagacgttgc tgttttcgct cgyggyccdc argcdcayct ggttcacggtgttgaagaag 60 28 60 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 28atggctttcg ctggttgcgt dgarccdtay acygaytgya acctgccggc tccgaccacc 60 2961 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 29 tgctcacctg gctgcttmaccdccdccdct ggcdctgctg gctggtgcta tgctgctcct 60 c 61 30 62 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 30 ttccgcctct agagaattct tartacagrgthgghgccag gaggagcagc atagcaccag 60 cc 62 31 58 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 31 aagcagccag gtgagcagcg tchggratrg argthgcggtggtcggagcc ggcaggtt 58 32 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 32 cgcaaccagc gaaagccatg atrtghgcha craargtytc ttcttcaacaccgtgaacca 60 33 60 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 33gcgaaaacag caacgtcttc rccrccrtgr gtytcrgahg cctgcggaac agcagcctgc 60 3460 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 34 agaggtagag tcgttaacgtchggrcgrga rccrccrccc agagcgtaac ccggaccgtt 60 35 60 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 35 aagatttaga gtccagagct ttrgahgghgccagrccraa gatagaggta ccacgcaggg 60 36 60 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 36 acgtgagagt ggtcagcggt haccagratc agrgtrtccagttcagaggt cagttcgtta 60 37 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 37 gaacatacca gcttcggtca ghgccatrta hgcyttrtcg tcgtggtgaccgtggtcgat 60 38 60 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 38ggtagaaacc acgcgggtta cgrgahacha crcgcaghgc aacttcggtc atttcctgca 60 3960 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 39 tcctgctgaa cgttgtatttcatrtchgch ggytcraaca gacccatcag gtgggtaaca 60 40 60 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 40 cagcagagcg gtacggttcc ahacrtaytghgcrccytgg tgtttagcct gccaagcctg 60 41 60 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 41 tacgaacacc gttaacagaa gcrtcrtchg grtaytchgggtccggggta ccaaccggga 60 42 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 42 cccaggataa cgtcgatgtc catrttrtth accagytghg cagcgatgtcctggcaaccg 60 43 60 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 43caggtcagcg tcagagtacc arttrcgrtt hacrgtrtga gcgtaagcac cagccggaga 60 4460 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 44 tggtaacaac accaacagatttrcchgcyt tytthgcrcg gttcataaca gaggtaactt 60 45 60 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 45 cactggttgt aacgagcagc hgcrgahacrccratrgtrc ggtagttacc tttaacaccg 60 46 60 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 46 accagcagag tccggaacct grcgrtchac rttrtargttttagacagag caacgtacgg 60 47 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 47 gggtttccgg acccagttta ccrttcatyt grccyttcag gatacgggtagcggtaacgg 60 48 60 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 48cccaggaaca ggataacgtt ytthgchgcr gtytgrathg gctgcagttt tttagcaacg 60 4942 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 49 acggttccag aaagccgggtcttcctcttc aaccggaacc ag 42 50 60 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for codon usagelibrary 50 cctgagcaga cataacacca gchgchachg chachgccag cggcagtttacgcagggtga 60 51 62 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for codon usage library 51accggggtga acagcagcgg cagcaghgcc aghgcratrg trgactgttt catatgtata 60 tc62 52 59 DNA Artificial Sequence Description of Artificial Sequencedegenerate oligonucleotide used for codon usage library 52 gccggctgagcagccagcag cagcagrcch gchgchgcgg tcggcagcag gtagtttca 59 53 60 DNAArtificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 53 aagagatagc gatcggggtggtcaghacra trcccagcag tttagcacgc atatgtatat 60 54 58 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for codon usage library 54 caacggtagc gaaaccagcc aghgchachgcrathgcrat agcggttttt ttcatatg 58 55 39 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used forcodon usage library 55 agaattctct agaggcggaa actctccaac tcccaggtt 39 5639 DNA Artificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for codon usage library 56 tgagaggttg agggtccaattgggaggtca aggcttggg 39 57 18 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for alpha interferonshuffling 57 tgtratctgy ctsagacc 18 58 23 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used foralpha interferon shuffling 58 ggcacaaatg vgmagaatct ctc 23 59 22 DNAArtificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for alpha interferon shuffling 59 agagattctkcbcatttgtg cc 22 60 24 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for alpha interferon shuffling60 cagttccaga agrctsmagc catc 24 61 24 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used foralpha interferon shuffling 61 gatggctksa gycttctgga actg 24 62 19 DNAArtificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for alpha interferon shuffling 62 cttcaatctcttcascaca 19 63 19 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for alpha interferon shuffling63 tgtgstgaag agattgaag 19 64 18 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for alpha interferonshuffling 64 ggawsagass ctcctaga 18 65 18 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used foralpha interferon shuffling 65 tctaggagss tctswtcc 18 66 21 DNAArtificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for alpha interferon shuffling 66 gaacttdwccagcaamtgaa t 21 67 21 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for alpha interferon shuffling67 attcakttgc tggwhaagtt c 21 68 19 DNA Artificial Sequence Descriptionof Artificial Sequence degenerate oligonucleotide used for alphainterferon shuffling 68 ggactycatc ctggctgtg 19 69 19 DNA ArtificialSequence Description of Artificial Sequence degenerate oligonucleotideused for alpha interferon shuffling 69 cacagccagg atgragtcc 19 70 18 DNAArtificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for alpha interferon shuffling 70 aagaatcactctttatct 18 71 18 DNA Artificial Sequence Description of ArtificialSequence degenerate oligonucleotide used for alpha interferon shuffling71 agataaagag tgattctt 18 72 19 DNA Artificial Sequence Description ofArtificial Sequence degenerate oligonucleotide used for alpha interferonshuffling 72 tgggaggttg tcagagcag 19 73 19 DNA Artificial SequenceDescription of Artificial Sequence degenerate oligonucleotide used foralpha interferon shuffling 73 ctgctctgac aacctccca 19 74 18 DNAArtificial Sequence Description of Artificial Sequence degenerateoligonucleotide used for alpha interferon shuffling 74 tcawtccttmctcyttaa 18 75 166 PRT consensus alpha interferon 75 Cys Asp Leu Pro GlnThr His Ser Leu Gly Asn Arg Arg Ala Leu Ile 1 5 10 15 Leu Leu Ala GlnMet Gly Arg Ile Ser Pro Phe Ser Cys Leu Lys Asp 20 25 30 Arg His Asp PheGly Phe Pro Gln Glu Glu Phe Asp Gly Asn Gln Phe 35 40 45 Gln Lys Ala GlnAla Ile Ser Val Leu His Glu Met Ile Gln Gln Thr 50 55 60 Phe Asn Leu PheSer Thr Lys Asp Ser Ser Ala Ala Trp Glu Gln Ser 65 70 75 80 Leu Leu GluLys Phe Ser Thr Glu Leu Tyr Gln Gln Leu Asn Asp Leu 85 90 95 Glu Ala CysVal Ile Gln Glu Val Gly Val Glu Glu Thr Pro Leu Met 100 105 110 Asn GluAsp Ser Ile Leu Ala Val Arg Lys Tyr Phe Gln Arg Ile Thr 115 120 125 LeuTyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 130 135 140Arg Ala Glu Ile Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gln Lys 145 150155 160 Arg Leu Arg Arg Lys Asp 165 76 166 PRT human alpha interferon 76Cys Asp Leu Pro Gln Thr His Ser Leu Gly Asn Arg Arg Ala Leu Ile 1 5 1015 Leu Leu Ala Gln Met Gly Arg Ile Ser Pro Phe Ser Cys Leu Lys Asp 20 2530 Arg His Asp Phe Gly Leu Pro Gln Glu Glu Phe Asp Gly Asn Gln Phe 35 4045 Gln Lys Thr Gln Ala Ile Pro Val Leu His Glu Met Ile Gln Gln Thr 50 5560 Phe Asn Leu Phe Ser Thr Glu Asp Ser Ser Ala Ala Trp Glu Gln Ser 65 7075 80 Leu Leu Glu Lys Phe Ser Thr Glu Leu Tyr Gln Gln Leu Asn Asn Leu 8590 95 Glu Ala Cys Val Ile Gln Glu Val Gly Met Glu Glu Thr Pro Leu Met100 105 110 Asn Glu Asp Ser Ile Leu Ala Val Arg Lys Tyr Phe Gln Arg IleThr 115 120 125 Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp GluVal Val 130 135 140 Arg Ala Glu Ile Met Arg Ser Leu Ser Phe Ser Thr AsnLeu Gln Lys 145 150 155 160 Arg Leu Arg Arg Lys Asp 165 77 166 PRT humanalpha interferon 77 Cys Asp Leu Pro Gln Thr His Ser Leu Gly Asn Arg ArgAla Leu Ile 1 5 10 15 Leu Leu Ala Gln Met Gly Arg Ile Ser Pro Phe SerCys Leu Lys Asp 20 25 30 Arg Pro Asp Phe Gly Leu Pro Gln Glu Glu Phe AspGly Asn Gln Phe 35 40 45 Gln Lys Thr Gln Ala Ile Ser Val Leu His Glu MetIle Gln Gln Thr 50 55 60 Phe Asn Leu Phe Ser Thr Glu Asp Ser Ser Ala AlaTrp Glu Gln Ser 65 70 75 80 Leu Leu Glu Lys Phe Ser Thr Glu Leu Tyr GlnGln Leu Asn Asn Leu 85 90 95 Glu Ala Cys Val Ile Gln Glu Val Gly Met GluGlu Thr Pro Leu Met 100 105 110 Asn Glu Asp Ser Ile Leu Ala Val Arg LysTyr Phe Gln Arg Ile Thr 115 120 125 Leu Tyr Leu Thr Glu Lys Lys Tyr SerPro Cys Ala Trp Glu Val Val 130 135 140 Arg Ala Glu Ile Met Arg Ser LeuSer Phe Ser Thr Asn Leu Gln Lys 145 150 155 160 Ile Leu Arg Arg Lys Asp165 78 166 PRT human alpha interferon 78 Cys Asn Leu Ser Gln Thr His SerLeu Asn Asn Arg Arg Thr Leu Met 1 5 10 15 Leu Leu Ala Gln Met Arg ArgIle Ser Pro Phe Ser Cys Leu Lys Asp 20 25 30 Arg His Asp Phe Glu Phe ProGln Glu Glu Phe Asp Gly Asn Gln Phe 35 40 45 Gln Lys Ala Gln Ala Ile SerVal Leu His Glu Met Met Gln Gln Thr 50 55 60 Phe Asn Leu Phe Ser Thr LysAsn Ser Ser Ala Ala Trp Asp Glu Thr 65 70 75 80 Leu Leu Glu Lys Phe TyrIle Glu Leu Phe Gln Gln Met Asn Asp Leu 85 90 95 Glu Ala Cys Val Ile GlnGlu Val Gly Val Glu Glu Thr Pro Leu Met 100 105 110 Asn Glu Asp Ser IleLeu Ala Val Lys Lys Tyr Phe Gln Arg Ile Thr 115 120 125 Leu Tyr Leu MetGlu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 130 135 140 Arg Ala GluIle Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gln Lys 145 150 155 160 ArgLeu Arg Arg Lys Asp 165 79 166 PRT human alpha interferon 79 Cys Asp LeuPro Gln Thr His Ser Leu Gly Asn Arg Arg Ala Leu Ile 1 5 10 15 Leu LeuAla Gln Met Gly Arg Ile Ser His Phe Ser Cys Leu Lys Asp 20 25 30 Arg HisAsp Phe Gly Phe Pro Glu Glu Glu Phe Asp Gly His Gln Phe 35 40 45 Gln LysThr Gln Ala Ile Ser Val Leu His Glu Met Ile Gln Gln Thr 50 55 60 Phe AsnLeu Phe Ser Thr Glu Asp Ser Ser Ala Ala Trp Glu Gln Ser 65 70 75 80 LeuLeu Glu Lys Phe Ser Thr Glu Leu Tyr Gln Gln Leu Asn Asp Leu 85 90 95 GluAla Cys Val Ile Gln Glu Val Gly Val Glu Glu Thr Pro Leu Met 100 105 110Asn Val Asp Ser Ile Leu Ala Val Arg Lys Tyr Phe Gln Arg Ile Thr 115 120125 Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 130135 140 Arg Ala Glu Ile Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gln Lys145 150 155 160 Arg Leu Arg Arg Lys Asp 165 80 166 PRT human alphainterferon 80 Cys Asp Leu Pro Gln Thr His Ser Leu Gly His Arg Arg ThrMet Met 1 5 10 15 Leu Leu Ala Gln Met Arg Arg Ile Ser Leu Phe Ser CysLeu Lys Asp 20 25 30 Arg His Asp Phe Arg Phe Pro Gln Glu Glu Phe Asp GlyAsn Gln Phe 35 40 45 Gln Lys Ala Glu Ala Ile Ser Val Leu His Glu Val IleGln Gln Thr 50 55 60 Phe Asn Leu Phe Ser Thr Lys Asp Ser Ser Val Ala TrpAsp Glu Arg 65 70 75 80 Leu Leu Asp Lys Leu Tyr Thr Glu Leu Tyr Gln GlnLeu Asn Asp Leu 85 90 95 Glu Ala Cys Val Met Gln Glu Val Trp Val Gly GlyThr Pro Leu Met 100 105 110 Asn Glu Asp Ser Ile Leu Ala Val Arg Lys TyrPhe Gln Arg Ile Thr 115 120 125 Leu Tyr Leu Thr Glu Lys Lys Tyr Ser ProCys Ala Trp Glu Val Val 130 135 140 Arg Ala Glu Ile Met Arg Ser Phe SerSer Ser Arg Asn Leu Gln Glu 145 150 155 160 Arg Leu Arg Arg Lys Glu 16581 166 PRT human alpha interferon 81 Cys Asp Leu Pro Gln Thr His Ser LeuArg Asn Arg Arg Ala Leu Ile 1 5 10 15 Leu Leu Ala Gln Met Gly Arg IleSer Pro Phe Ser Cys Leu Lys Asp 20 25 30 Arg His Glu Phe Arg Phe Pro GluGlu Glu Phe Asp Gly His Gln Phe 35 40 45 Gln Lys Thr Gln Ala Ile Ser ValLeu His Glu Met Ile Gln Gln Thr 50 55 60 Phe Asn Leu Phe Ser Thr Glu AspSer Ser Ala Ala Trp Glu Gln Ser 65 70 75 80 Leu Leu Glu Lys Phe Ser ThrGlu Leu Tyr Gln Gln Leu Asn Asp Leu 85 90 95 Glu Ala Cys Val Ile Gln GluVal Gly Val Glu Glu Thr Pro Leu Met 100 105 110 Asn Glu Asp Phe Ile LeuAla Val Arg Lys Tyr Phe Gln Arg Ile Thr 115 120 125 Leu Tyr Leu Met GluLys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 130 135 140 Arg Ala Glu IleMet Arg Ser Phe Ser Phe Ser Thr Asn Leu Lys Lys 145 150 155 160 Gly LeuArg Arg Lys Asp 165 82 166 PRT human alpha interferon 82 Cys Asp Leu ProGln Thr His Ser Leu Gly Asn Arg Arg Ala Leu Ile 1 5 10 15 Leu Leu AlaGln Met Arg Arg Ile Ser Pro Phe Ser Cys Leu Lys Asp 20 25 30 Arg His AspPhe Glu Phe Pro Gln Glu Glu Phe Asp Asp Lys Gln Phe 35 40 45 Gln Lys AlaGln Ala Ile Ser Val Leu His Glu Met Ile Gln Gln Thr 50 55 60 Phe Asn LeuPhe Ser Thr Lys Asp Ser Ser Ala Ala Leu Asp Glu Thr 65 70 75 80 Leu LeuAsp Glu Phe Tyr Ile Glu Leu Asp Gln Gln Leu Asn Asp Leu 85 90 95 Glu SerCys Val Met Gln Glu Val Gly Val Ile Glu Ser Pro Leu Met 100 105 110 TyrGlu Asp Ser Ile Leu Ala Val Arg Lys Tyr Phe Gln Arg Ile Thr 115 120 125Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Ser Cys Ala Trp Glu Val Val 130 135140 Arg Ala Glu Ile Met Arg Ser Phe Ser Leu Ser Ile Asn Leu Gln Lys 145150 155 160 Arg Leu Lys Ser Lys Glu 165 83 166 PRT human alphainterferon 83 Cys Asp Leu Pro Glu Thr His Ser Leu Asp Asn Arg Arg ThrLeu Met 1 5 10 15 Leu Leu Ala Gln Met Ser Arg Ile Ser Pro Ser Ser CysLeu Met Asp 20 25 30 Arg His Asp Phe Gly Phe Pro Gln Glu Glu Phe Asp GlyAsn Gln Phe 35 40 45 Gln Lys Ala Pro Ala Ile Ser Val Leu His Glu Leu IleGln Gln Ile 50 55 60 Phe Asn Leu Phe Thr Thr Lys Asp Ser Ser Ala Ala TrpAsp Glu Asp 65 70 75 80 Leu Leu Asp Lys Phe Cys Thr Glu Leu Tyr Gln GlnLeu Asn Asp Leu 85 90 95 Glu Ala Cys Val Met Gln Glu Glu Arg Val Gly GluThr Pro Leu Met 100 105 110 Asn Ala Asp Ser Ile Leu Ala Val Lys Lys TyrPhe Arg Arg Ile Thr 115 120 125 Leu Tyr Leu Thr Glu Lys Lys Tyr Ser ProCys Ala Trp Glu Val Val 130 135 140 Arg Ala Glu Ile Met Arg Ser Leu SerLeu Ser Thr Asn Leu Gln Glu 145 150 155 160 Arg Leu Arg Arg Lys Glu 16584 166 PRT human alpha interferon 84 Cys Asp Leu Pro Gln Thr His Ser LeuGly Asn Arg Arg Ala Leu Ile 1 5 10 15 Leu Leu Ala Gln Met Gly Arg IleSer Pro Phe Ser Cys Leu Lys Asp 20 25 30 Arg His Asp Phe Gly Phe Pro GlnGlu Glu Phe Asp Gly Asn Gln Phe 35 40 45 Gln Lys Ala Gln Ala Ile Ser ValLeu His Glu Met Ile Gln Gln Thr 50 55 60 Phe Asn Leu Phe Ser Thr Lys AspSer Ser Ala Ile Trp Glu Gln Ser 65 70 75 80 Leu Leu Glu Lys Phe Ser ThrGlu Leu Asn Gln Gln Leu Asn Asp Met 85 90 95 Glu Ala Cys Val Ile Gln GluVal Gly Val Glu Glu Thr Pro Leu Met 100 105 110 Asn Val Asp Ser Ile LeuAla Val Lys Lys Tyr Phe Gln Arg Ile Thr 115 120 125 Leu Tyr Leu Thr GluLys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 130 135 140 Arg Ala Glu IleMet Arg Ser Phe Ser Leu Ser Lys Ile Phe Gln Glu 145 150 155 160 Arg LeuArg Arg Lys Ser 165 85 166 PRT human alpha interferon 85 Cys Asp Leu ProGln Thr His Ser Leu Gly Asn Arg Arg Ala Leu Ile 1 5 10 15 Leu Leu AlaGln Met Gly Arg Ile Ser Pro Phe Ser Cys Leu Lys Asp 20 25 30 Arg Pro AspPhe Gly Leu Pro Gln Glu Glu Phe Asp Gly Asn Gln Phe 35 40 45 Gln Lys ThrGln Ala Ile Ser Val Leu His Glu Met Ile Gln Gln Thr 50 55 60 Phe Asn LeuPhe Ser Thr Glu Asp Ser Ser Ala Ala Trp Glu Gln Ser 65 70 75 80 Leu LeuGlu Lys Phe Ser Thr Glu Leu Tyr Gln Gln Leu Asn Asn Leu 85 90 95 Glu AlaCys Val Ile Gln Glu Val Gly Met Glu Glu Thr Pro Leu Met 100 105 110 AsnGlu Asp Ser Ile Leu Ala Val Arg Lys Tyr Phe Gln Arg Ile Thr 115 120 125Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 130 135140 Arg Ala Glu Ile Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gln Lys 145150 155 160 Ile Leu Arg Arg Lys Asp 165 86 166 PRT human alphainterferon 86 Cys Asp Leu Pro Gln Thr His Ser Leu Gly Asn Arg Arg AlaLeu Ile 1 5 10 15 Leu Leu Ala Gln Met Gly Arg Ile Ser His Phe Ser CysLeu Lys Asp 20 25 30 Arg Tyr Asp Phe Gly Phe Pro Gln Glu Val Phe Asp GlyAsn Gln Phe 35 40 45 Gln Lys Ala Gln Ala Ile Ser Ala Phe His Glu Met IleGln Gln Thr 50 55 60 Phe Asn Leu Phe Ser Thr Lys Asp Ser Ser Ala Ala TrpAsp Glu Thr 65 70 75 80 Leu Leu Asp Lys Phe Tyr Ile Glu Leu Phe Gln GlnLeu Asn Asp Leu 85 90 95 Glu Ala Cys Val Thr Gln Glu Val Gly Val Glu GluIle Ala Leu Met 100 105 110 Asn Glu Asp Ser Ile Leu Ala Val Arg Lys TyrPhe Gln Arg Ile Thr 115 120 125 Leu Tyr Leu Met Gly Lys Lys Tyr Ser ProCys Ala Trp Glu Val Val 130 135 140 Arg Ala Glu Ile Met Arg Ser Phe SerPhe Ser Thr Asn Leu Gln Lys 145 150 155 160 Gly Leu Arg Arg Lys Asp 16587 501 DNA consensus alpha interferon 87 tgtgatctgc ctcagacccacagcctgggt aataggaggg ccttgatact cctggcacaa 60 atgggaagaa tctctcctttctcctgcctg aaggacagac atgactttgg atttccccag 120 gaggagtttg atggcaaccagttccagaag gctcaagcca tctctgtcct ccatgagatg 180 atccagcaga ccttcaatctcttcagcaca aaggactcat ctgctgcttg ggatgagagc 240 ctcctagaaa aattttccactgaactttac cagcaactga atgacctgga agcctgtgtg 300 atacaggagg ttggggtggaagagactccc ctgatgaatg aggactccat cctggctgtg 360 aggaaatact tccaaagaatcactctttat ctgacagaga agaaatacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tccttctctt tttcaacaaa cttgcaaaaa 480 agattaagga ggaaggattg a501 88 501 DNA human alpha interferon 88 tgtgatctgc ctcagacccacagcctgggt aataggaggg ccttgatact cctggcacaa 60 atgggaagaa tctctcctttctcctgcctg aaggacagac atgactttgg acttccccag 120 gaggagtttg atggcaaccagttccagaag actcaagcca tccctgtcct ccatgagatg 180 atccagcaga ccttcaatctcttcagcaca gaggactcat ctgctgcttg ggaacagagc 240 ctcctagaaa aattttccactgaactttac cagcaactga ataacctgga agcatgtgtg 300 atagaggagg ttgggatggaagagactccc ctgatgaatg aggactccat cctggctgtg 360 aggaaatact tccaaagaatcactctttat ctaacagaga agaaatacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tccctctctt tttcaacaaa cttgcaaaaa 480 agattaagga ggaaggattg a501 89 501 DNA human alpha interferon 89 tgtgatctgc ctcagacccacagcctgggt aataggaggg ccttgatact cctggcacaa 60 atgggaagaa tctctcctttctcctgcctg aaggacagac ctgactttgg acttccccag 120 gaggagtttg atggcaaccagttccagaag actcaagcca tctctgtcct ccatgagatg 180 atccagcaga ccttcaatctcttcagcaca gaggactcat ctgctgcttg ggaacagagc 240 ctcctagaaa aattttccactgaactttac cagcaactga ataacctgga agcatgtgtg 300 atacaggagg ttgggatggaagagactccc ctgatgaatg aggactccat cctggctgtg 360 aggaaatact tccaaagaatcactctttat ctaacagaga agaaatacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tctctctctt tttcaacaaa cttgcaaaaa 480 atattaagga ggaaggattg a501 90 501 DNA human alpha interferon 90 tgtaatctgt ctcaaacccacagcctgaat aacaggagga ctttgatgct catggcacaa 60 atgaggagaa tctctcctttctcctgcctg aaggacagac atgactttga atttccccag 120 gaggaatttg atggcaaccagttccagaaa gctcaagcca tctctgtcct ccatgagatg 180 atgcagcaga ccttcaatctcttcagcaca aagaactcat ctgctgcttg ggatgagacc 240 ctcctagaaa aattctacattgaacttttc cagcaaatga atgacctgga agcctgtgtg 300 atacaggagg ttggggtggaagagactccc ctgatgaatg aggactccat cctggctgtg 360 aagaaatact tccaaagaatcactctttat ctgatggaga agaaatacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tccctctctt tttcaacaaa cttgcaaaaa 480 agattaagga ggaaggattg a501 91 501 DNA human alpha interferon 91 tgtgatctgc ctcagacccacagcctgggt aataggaggg ccttgatact cctggcacaa 60 atgggaagaa tctctcctttctcatgcctg aaggacagac atgatttcgg attccccgag 120 gaggagtttg atggccaccagttccagaag actcaagcca tctctgtcct ccatgagatg 180 atccagcaga ccttcaatctcttcagcaca gaggactcat ctgctgcttg ggaacagagc 240 ctcctagaaa aattttccactgaactttac cagcaactga atgacctgga agcatgtgtg 300 atacaggagg ttggggtggaagagactccc ctgatgaatg tggactccat cctggctgtg 360 aggaaatact tccaaagaatcactctttat ctaacagaga agaaatacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tccctctcgt tttcaacaaa cttgcaaaaa 480 agattaagga ggaaggattg a501 92 501 DNA human alpha interferon 92 tgtgatctgc ctcagacccacagcctgggt cacaggagga ccatgatgct cctggcacaa 60 atgaggagaa tctctcttttctcctgtctg aaggacagac atgacttcag atttccccag 120 gaggagtttg atggcaaccagttccagaag gctgaagcca tctctgtcct ccatgaggtg 180 attcagcaga ccttcaatctcttcagcaca aaggactcat ctgttgcttg ggatgagagg 240 cttctagaca aactctatactgaactttac cagcagctga atgacctgga agcctgtgtg 300 atgcaggagg tgtgggtgggagggactccc ctgatgaatg aggactccat cctggctgtg 360 agaaaatact tccaaagaatcactctctac ctgacagaga aaaagtacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tccttctctt catcaagaaa cttgcaagaa 480 aggttaagga ggaaggaata a501 93 501 DNA human alpha interferon 93 tgtgatctgc ctcagacccacagcctgcgt aataggaggg ccttgatact cctggcacaa 60 atgggaagaa tctctcctttctcctgcttg aaggacagac atgaattcag attcccagag 120 gaggagtttg atggccaccagttccagaag actcaagcca tctctgtcct ccatgagatg 180 atccagcaga ccttcaatctcttcagcaca gaggactcat ctgctgcttg ggaacagagc 240 ctcctagaaa aattttccactgaactttac cagcaactga atgacctgga agcatgtgtg 300 atacaggagg ttggggtggaagagactccc ctgatgaatg aggactccat cctggctgtg 360 aggaaatact tccaaagaatcactctttat ctaatggaga agaaatacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tccttctctt tttcaacaaa cttgaaaaaa 480 ggattaagga ggaaggattg a501 94 501 DNA human alpha interferon 94 tgtgatctgc ctcagactcacagcctgggt aacaggaggg ccttgatact cctggcacaa 60 atgcgaagaa tctctcctttctcctgcctg aaggacagac atgactttga attcccccag 120 gaggagtttg atgataaacagttccagaag gctcaagcca tctctgtcct ccatgagatg 180 atccagcaga ccttcaacctcttcagcaca aaggactcat ctgctgcttt ggatgagacc 240 cttctagatg aattctacatcgaacttgac cagcagctga atgacctgga gtcctgtgtg 300 atgcaggaag tgggggtgatagagtctccc ctgatgaatg aggacttcat cctggctgtg 360 aggaaatact tccaaagaatcactctatat ctgacagaga agaaatacag ctcttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tccttctctt tatcaatcaa cttgcaaaaa 480 agattgaaga gtaaggaatg a501 95 501 DNA human alpha interferon 95 tgtgatctcc ctgagacccacagcctggat aacaggagga ccttgatgct cctggcacaa 60 atgagcagaa tctctccttcctcctgtctg atggacagac atgactttgg atttccccag 120 gaggagtttg atggcaaccagttccagaag gctccagcca tctctgtcct ccatgagctg 180 atccagcaga tcttcaacctcttctccaca aaagattcat ctgctgcttg ggatgaggac 240 ctcctagaca aattctgcaccgaactctac cagcagctga atgacttgga agcctgtgtg 300 atgcaggagg agagggtgggagaaactccc ctgatgtacg cggactccat cctggctgtg 360 aagaaatact tccaaagaatcactctctat ctgacagaga agaaatacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tccctctctt tatcaacaaa cttgcaagaa 480 agattaagga ggaaggaata a501 96 501 DNA human alpha interferon 96 tgtgatctgc ctcagacccacagcctgggt aataggaggg ccttgatact cctggcacaa 60 atgggaagaa tctctcctttctcctgcctg aaggacagac atgactttgg attcccccaa 120 gaggagtttg atggcaaccagttccagaag gctcaagcca tctctgtcct ccatgagatg 180 atccagcaga ccttcaatctcttcagcaca aaggactcat ctgctacttg ggaacagagc 240 ctcctagaaa aattttccactgaacttaac cagcagctga atgacatgga agcctgcgtg 300 atacaggagg ttggggtggaagagactccc ctgatgaatg tggactctat cctggctgtg 360 aagaaatact tccaaagaatcactctttat ctgacagaga agaaatacag cccttgtgct 420 tgggaggttg tcagagcagaaatcatgaga tccttctctt tatcaaaaat ttttcaagaa 480 agattaagga ggaaggaatg a501 97 501 DNA human alpha interferon 97 tgtgatctgc ctcagacccacagcctgggt aataggaggg ccttgatact cctggcacaa 60 atgggaagaa tctctcctttctcctgcctg aaggacagac ctgactttgg acttccccag 120 gaggagtttg atggcaaccagttccagaag actcaagcca tctctgtcct ccatgagatg 180 atccagcaga ccttcaatctcttcagcaca gaggactcat ctgctgcttg ggaacagagc 240 ctcctagaaa aattttccactgaactttac cagcaactga ataacctgga agcatgtgtg 300 atacaggagg ttgggatggaagagactccc ctgatgaatg aggactccat cttggctgtg 360 aggaaatact tccaaagaatcactctttat ctaacagaga agaaatacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tctctctctt tttcaacaaa cttgcaaaaa 480 agattaagga ggaaggattg a501 98 501 DNA human alpha interferon 98 tgtgatctgc ctcagactcacagcctgggt aataggaggg ccttgatact cctggcacaa 60 atgggaagaa tctctcatttctcctgcctg aaggacagat atgatttcgg attcccccag 120 gaggtgtttg atggcaaccagttccagaag gctcaagcca tctctgcctt ccatgagatg 180 atccagcaga ccttcaatctcttcagcaca aaggattcat ctgctgcttg ggatgagacc 240 ctcctagaca aattctacattgaacttttc cagcaactga atgacctaga agcctgtgtg 300 acacaggagg ttggggtggaagagattgcc ctgatgaatg aggactccat cctggctgtg 360 aggaaatact ttcaaagaatcactctttat ctgatggaga agaaatacag cccttgtgcc 420 tgggaggttg tcagagcagaaatcatgaga tccttctctt tttcaacaaa cttgcaaaaa 480 ggattaagaa ggaaggattg a501 99 11 PRT Artificial Sequence Description of Artificial SequenceProtease peptide substrate 99 Arg Gly Val Val Asn Ala Ser Ser Arg LeuAla 1 5 10 100 44 DNA Artificial Sequence Description of ArtificialSequence Introduced Sfi I site 100 ttccatttca tacatggccg aaggggccgtgccatgagga tttt 44 101 50 DNA Artificial Sequence Description ofArtificial Sequence Introduced sfi I site 101 ttctaaatgc atgttggcctccttggccgg attctgagcc ttcaggacca 50

What is claimed is:
 1. A method for evolving a protein encoded by a DNAsubstrate molecule comprising: (a) digesting at least a first and secondDNA substrate molecule, wherein the at least a first and secondsubstrate molecules differ from each other in at least one nucleotide,with a restriction endonuclease; (b) ligating the mixture to generate alibrary of recombinant DNA molecules; (c) screening or selecting theproducts of (b) for a desired property; and (d) recovering a recombinantDNA substrate molecule encoding an evolved protein.
 2. The method ofclaim 1, wherein the restriction endonuclease generates non-palindromicends at cleavage sites.
 3. The method of claim 1, wherein the substratemolecules have been engineered to contain at least one recognition sitefor a restriction endonuclease having non-palindromic ends at cleavagesites.
 4. The method of claim 1, wherein (a)-(d) are repeated.
 5. Themethod of claim 1, wherein the DNA substrate molecule comprises a genecluster.
 6. The method of claim 1, wherein at least one restrictionendonuclease fragment from a DNA substrate molecule is isolated andsubjected to mutagenesis to generate a library of mutant fragments. 7.The method of step 6, wherein the library of mutant fragments is used inthe ligation of (b).
 8. The method of claim 7, wherein the DNA substratemolecule encodes all or part of a protein selected from Table I.
 9. Themethod of claim 6, wherein mutagenesis comprises recursive sequencerecombination.
 10. The method of claim 1, wherein the products of (d)are subjected to mutagenesis.
 11. The method of claim 10, whereinmutagenesis comprises recursive sequence recombination.
 12. The methodof claim 1, wherein the products of (d) are used as a DNA substratemolecule in (b).
 13. The method of claim 10, wherein the products ofclaim 10 are used in (d).
 14. The method of claim 1, wherein therecombinant DNA substrate molecule of (d) comprises a library ofrecombinant DNA substrate molecules.
 15. An evolved protein produced bythe method of claim
 1. 16. A method for evolving a protein encoded by aDNA substrate molecule by recombining at least a first and second DNAsubstrate molecule, wherein the at least a first and second substratemolecules differ from each other in at least one nucleotide and comprisedefined segments, the method comprising: (a) providing a set ofoligonucleotide PCR primers, comprising at least one primer for eachstrand of each segment, wherein the primer sequence is complementary toat least one junction with another segment; (b) amplifying the segmentsof the at least a first and second DNA substrate molecules with theprimers of step (a) in a polymerase chain reaction; (c) assembling theproducts of step (b) to generate a library of recombinant DNA substratemolecules; (d) screening or selecting the products of (c) for a desiredproperty; and (e) recovering a recombinant DNA substrate molecule from(d) encoding an evolved protein.
 17. The method of claim 16, wherein theat least a first and second DNA substrate molecules are subjected tomutagenesis prior to step (a).
 18. The method of claim 16, wherein theat least a first and second DNA substrate molecules comprise alleles ofa gene.
 19. The method of claim 16, wherein the at least a first andsecond DNA substrate molecules comprise a library of mutants.
 20. Themethod of claim 16, wherein the segments are defined by sites withinintergenic regions.
 21. The method of claim 16, wherein the segments aredefined by sites within introns.
 22. The method of claim 16, wherein theprimers comprise a uracil substitution at one or more thymidineresidues.
 23. The method of claim 22, wherein the products of (b) aretreated with uracil glycosylase.
 24. The method of claim 16, wherein(a)-(e) are repeated.
 25. The method of claim 16, wherein the at least afirst and second DNA substrate molecule comprises a gene cluster. 26.The method of claim 16, wherein the at least first and second DNAsubstrate molecule encodes all or part of a DNA polymerase.
 27. Themethod of claim 16, wherein at least one PCR primer differs from the atleast a first and second DNA substrate molecules in at least onenucleotide.
 28. The method of claim 27, wherein the PCR primer comprisesa nucleotide sequence of a known mutant or polymorphism of the at leasta first or second DNA substrate molecule.
 29. The method of claim 28,wherein the PCR primer is degenerate and encodes the nucleotidesequences of more than one known mutant or polymorphism of the at leasta first or second DNA substrate molecule.
 30. The method of claim 29,wherein the at least a first and second DNA substrate molecule encodesall or part of a protein selected from Table I.
 31. The method of claim17, wherein mutagenesis comprises recursive sequence recombination. 32.The method of claim 16, wherein the products of (e) are subjected tomutagenesis.
 33. The method of claim 32, wherein mutagenesis comprisesrecursive sequence recombination.
 34. The method of claim 32, whereinthe products of claim 32 are used in (b).
 35. The method of claim 16,wherein the products of (e) are used as a DNA substrate molecule in (b).36. The method of claim 16, wherein the recombinant DNA substratemolecule of (e) comprises a library of recombinant DNA substratemolecules.
 37. An evolved protein produced by the method of claim 16.38. A method of enriching a population of DNA fragments for mutantsequences comprising: (a) denaturing and renaturing the population offragments to generate a population of hybrid double-stranded fragmentsin which at least one double-stranded fragment comprises at least onebase pair mismatch; (b) fragmenting the products of (a) into fragmentsof about 20-100 bp; (c) affinity-purifying fragments having a mismatchon an affinity matrix to generate a pool of DNA fragments enriched formutant sequences; and (d) assembling the products of (c) to generate alibrary of recombinant DNA substrate molecules.
 39. The method of claim38, wherein the population of DNA fragments is derived from at least afirst and second DNA substrate molecule, the at least a first and secondDNA substrate molecule differing from each other in at least onenucleotide.
 40. The method of claim 39, wherein the at least a first andsecond DNA substrate molecules are obtained by mutagenesis of a DNAsubstrate molecule.
 41. The method of claim 39, wherein the at least afirst and second DNA substrate molecules comprise alleles of a gene. 42.The method of claim 39, wherein the at least a first and second DNAsubstrate molecules comprise polymorphic variants of a gene.
 43. Themethod of claim 38, wherein the DNA substrate molecule encodes all orpart of a protein selected from Table I.
 44. The method of claim 38,wherein the products of (c) are mixed with the products of (a) prior to(d).
 45. A method for evolving a protein encoded by a DNA substratemolecule, by recombining at least a first and second DNA substratemolecule, wherein the at least a first and second substrate moleculesshare a region of sequence homology of about 10 to 100 base pairs andcomprise defined segments, the method comprising: (a) providing regionsof homology in the at least a first and second DNA substrate moleculesby inserting an intron sequence between at least two defined segments;(b) fragmenting and recombining DNA substrate molecules of (a), whereinregions of homology are provided by the introns; (c) screening orselecting the products of (b) for a desired property; and (d) recoveringa recombinant DNA substrate molecule from the products of (c) encodingan evolved protein.
 46. The method of claim 45, wherein the introns areself-splicing.
 47. The method of claim 45, wherein the inserted intronscomprise from about 1 to about 10 nonhomologous introns.
 48. The methodof claim 45, wherein the intron comprises a recognition site for arestriction endonucleases having non-palindromic ends at cleavage sites.49. The method of claim 45, wherein (b)-(d) are repeated.
 50. The methodof claim 45, wherein the DNA substrate molecule comprises a genecluster.
 51. The method of claim 45, wherein at least one segment from aDNA substrate molecule is isolated and subjected to mutagenesis togenerate a library of mutant fragments.
 52. The method of claim 51,wherein the library of mutant segments is used in the recombination of(b).
 53. The method of claim 45, wherein the segments are defined byexons.
 54. The method of claim 45, wherein the segments are defined byintergenic regions.
 55. The method of claim 45, wherein the at least afirst and second DNA substrate molecules encode protein homologues. 56.The method of claim 45, wherein the intron contains a lox site, andwherein the products of (b) are used to transfect a Cre⁺ host.
 57. Themethod of claim 45, wherein the at least a first and second DNAsubstrate molecule encodes all or part of a protein selected from TableI.
 58. The method of claim 45, wherein the at least a first and secondDNA substrate molecule are subjected to mutagenesis prior to step (a).59. The method of claim 58, wherein mutagenesis comprises recursivesequence recombination.
 60. The method of claim 45, wherein the productsof (d) are subjected to mutagenesis.
 61. The method of claim 58, whereinmutagenesis comprises recursive sequence recombination.
 62. The methodof claim 45, wherein the products of (d) are used as a DNA substratemolecule in (b).
 63. The method of claim 45, wherein the recombinant DNAsubstrate molecule of (d) comprises a library of recombinant DNAsubstrate molecules.
 64. An evolved protein produced by the method ofclaim
 45. 65. A method for evolving a protein encoded by a DNA substratemolecule by recombining at least a first and second DNA substratemolecule, wherein the at least a first and second substrate moleculesdiffer from each other in at least one nucleotide and comprise definedsegments, the method comprising: (a) providing a set of oligonucleotidePCR primers, wherein for each junction of segments a pair of primers isprovided, one member of each pair bridging the junction at one end of asegment and the other bridging the junction at the other end of thesegment, with the terminal ends of the DNA molecule having as one memberof the pair a generic primer, and wherein a set of primers is providedfor each of the at least a first and second substrate molecules; (b)amplifying the segments of the at least a first and second DNA substratemolecules with the primers of (a) in a polymerase chain reaction; (c)assembling the products of (b) to generate a pool of recombinant DNAmolecules; (d) selecting or screening the products of (c) for a desiredproperty; and (e) recovering a recombinant DNA substrate molecule fromthe products of (d) encoding an evolved protein.
 66. The method of claim65, wherein (a)-(e) is repeated.
 67. The method of claim 65, wherein theat least a first and second DNA substrate molecule are subjected tomutagenesis prior to (a).
 68. The method of claim 65, wherein the atleast a first and second DNA substrate molecule comprise sequencesencoding protein homologues.
 69. The method of claim 65, wherein theprimers comprise a uracil substitution at one or more thymidineresidues.
 70. The method of claim 69, wherein the products of (b) aretreated with uracil glycosylase.
 71. The method of claim 65, wherein theat least a first and second DNA substrate molecule encodes all or partof a protein selected from Table I.
 72. The method of claim 65, whereinthe at least a first and second DNA substrate molecule comprises a genecluster.
 73. An evolved protein produced by the method of claim
 65. 74.The method of claim 65, wherein at least one PCR primer differs from theat least a first and second substrate molecules in at least onenucleotide.
 75. The method of claim 74, wherein the PCR primer comprisesa nucleotide sequence of a known mutant or polymorphism of the at leasta first or second substrate molecule.
 76. The method of claim 75,wherein the PCR primer is degenerate and encodes the nucleotidesequences of more than one known mutant or polymorphism of the at leasta first or second substrate molecule.
 77. The method of claim 67,wherein mutagenesis comprises recursive sequence recombination.
 78. Themethod of claim 65, wherein the products of (e) are subjected tomutagenesis.
 79. The method of claim 78, wherein mutagenesis comprisesrecursive sequence recombination.
 80. The method of claim 65, whereinthe products of (e) are used as a DNA substrate molecule in (b).
 81. Themethod of claim 65, wherein the recombinant DNA substrate molecule of(e) comprises a library of recombinant DNA substrate molecules.
 82. Amethod for optimizing expression of a protein by evolving the protein,wherein the protein is encoded by a DNA substrate molecule, comprising:(a) providing a set of oligonucleotides, wherein each oligonucleotidecomprises at least two regions complementary to the DNA molecule and atleast one degenerate region, each degenerate region encoding a region ofan amino acid sequence of the protein; (b) assembling the set ofoligonucleotides into a library of full length genes; (c) expressing theproducts of (b) in a host cell; (d) screening the products of (c) forimproved expression of the protein; and (e) recovering a recombinant DNAsubstrate molecule encoding an evolved protein from (d).
 83. The methodof claim 82, wherein the primers comprise about 20 nucleotidescomplementary to the DNA substrate molecule followed by a second regionof about 20 degenerate nucleotides of homology with the DNA substratemolecules followed by about 20 nucleotides complementary to the DNAsubstrate.
 84. The method of claim 82, wherein the protein is bovineintestinal alkaline phosphatase.
 85. The method of claim 84, wherein theoligonucleotides comprise one or more primers from Table II.
 86. Themethod of claim 82, wherein the DNA substrate molecule encodes all orpart of a protein selected from Table I.
 87. The method of claim 82,wherein the DNA molecule comprises a gene cluster.
 88. The method ofclaim 82, wherein (a)-(e) are repeated.
 89. The method of claim 82,wherein the oligonucleotides comprise at least 5′ and 3′ nucleotidecomplementary to the DNA substrate molecule and about 20-300 nucleotideshaving up to about 85% sequence homology with a region of the DNAsubstrate molecule.
 90. The method of claim 89, wherein theoligonucleotides comprise a set of oligonucleotides in which eacholigonucleotide overlaps with a second oligonucleotide.
 91. The methodof claim 82, wherein the products of (e) are subjected to mutagenesis.92. The method of claim 91, wherein mutagenesis comprises recursivesequence recombination.
 93. The method of claim 82, wherein therecombinant DNA substrate molecule of (e) comprises a library ofrecombinant DNA substrate molecules.
 94. An evolved protein produced bythe method of claim
 82. 95. A method for optimizing expression of aprotein encoded by a DNA substrate molecule by evolving the protein,wherein the DNA substrate molecule comprises at least one lac operatorand a fusion of a DNA sequence encoding the protein with a DNA sequenceencoding a lac headpiece dimer, the method comprising: (a) transforminga host cell with a library of mutagenized DNA substrate molecules; (b)inducing expression of the protein encoded by the library of (a); (c)preparing an extract of the product of (b); (d) fractionating insolubleprotein from complexes of soluble protein and DNA; and (e) recovering aDNA substrate molecule encoding an evolved protein from (d).
 96. Themethod of claim 95, wherein (a)-(e) are repeated.
 97. The method ofclaim 95, wherein the DNA substrate molecule encodes all or part of aprotein selected from Table I.
 98. An evolved protein produced by themethod of claim
 95. 99. The method of claim 95, wherein the products of(e) are subjected to mutagenesis.
 100. The method of claim 99, whereinmutagenesis comprises recursive sequence recombination.
 101. The methodof claim 95, wherein the products of (e) are used as a DNA substratemolecule in (a).
 102. The method of claim 95, wherein the recombinantDNA substrate molecule of (e) comprises a library of recombinant DNAsubstrate molecules.
 103. A method for evolving functional expression ofa protein encoded by a DNA substrate molecule comprising a fusion of aDNA sequence encoding the protein with a DNA sequence encodingfilamentous phage protein to generate a fusion protein, the methodcomprising: (a) providing a host cell producing infectious particlesexpressing a fusion protein encoded by a library of mutagenized DNAsubstrate molecules; (b) recovering from (a) infectious particlesdisplaying the fusion protein; (c) affinity purifying particlesdisplaying the mutant protein using a ligand for the protein; and (d)recovering a DNA substrate molecule encoding an evolved protein fromaffinity purified particles of (c).
 104. The method of claim 103,wherein (a)-(d) are repeated.
 105. The method of claim 103, wherein theDNA substrate molecule encodes all or part of a protein selected fromTable I.
 106. An evolved protein produced by the method of claim 103.107. The method of claim 103, wherein the products of (d) are subjectedto mutagenesis.
 108. The method of claim 107, wherein mutagenesiscomprises recursive sequence recombination.
 109. The method of claim107, wherein the products of claim 107 are used as a DNA substratemolecule in (a).
 110. The method of claim 103, wherein the DNA substratemolecule of (e) comprises a library of DNA substrate molecules.
 111. Themethod of claim 103, wherein DNA sequence encoding the filamentous phageprotein comprises a phagemid.
 112. The method of claim 103, wherein DNAsequence encoding the filamentous phage protein comprises a phage. 113.A method for optimizing expression of a protein encoded by a DNAsubstrate molecule comprising a fusion of a DNA sequence encoding theprotein with a DNA substrate encoding a lac headpiece dimer, wherein theDNA substrate molecule is present on a first plasmid vector, the methodcomprising: (a) providing a host cell transformed with the first vectorand a second vector comprising a library of mutants of at least onechaperonin geneand at least one lac operator; (b) preparing an extractof the product of (a); (c) fractionating insoluble protein fromcomplexes of soluble protein and DNA; and (d) recovering DNA encoding achaperonin gene from (c).
 114. The method of claim 113, wherein the DNAsubstrate molecule encodes all or part of a protein selected from TableI.
 115. The method of claim 113, wherein the DNA substrate is subjectedto mutagenesis independently of the chaperonin gene prior to (a). 116.The method of claim 113, wherein the DNA of (d) comprises a library ofmutants.
 117. The method of claim 113, wherein the first and secondvectors are the same vector.
 118. The method of claim 113, wherein (d)further comprises recovering an evolved DNA substrate molecule from theproducts of (c).
 119. An evolved chaperonin produced by the method ofclaim
 113. 120. An evolved protein produced by the method of claim 113.121. The method of claim 113, wherein (a)-(d) are repeated.
 122. Themethod of claim 113, wherein the products of (d) are subjected tomutagenesis.
 123. The method of claim 122, wherein mutagenesis comprisesrecursive sequence recombination.
 124. The method of claim 122, whereinthe products of claim 122 are used in (a).
 125. A method for optimizingexpression of a protein encoded by a DNA substrate molecule comprising afusion of a DNA sequence encoding the protein with a filamentous phagegene, wherein the fusion is carried on a phagemid comprising a libraryof chaperonin gene mutants, the method comprising: (a) providing a hostcell producing infectious particles expressing a fusion protein encodedby a library of mutagenized DNA substrate molecules; (b) recovering from(a) infectious particles displaying the fusion protein; (c) affinitypurifying particles displaying the protein using a ligand for theprotein; and (d) recovering DNA encoding the mutant chaperonin fromaffinity purified particles of (c).
 126. The method of claim 125,wherein (a)-(d) are repeated.
 127. The method of claim 125, wherein theDNA substrate molecule encodes all or part of a protein selected fromTable I.
 128. An evolved chaperonin produced by the method of claim 125.129. An evolved protein produced by the method of claim
 125. 130. Themethod of claim 125, wherein the products of (d) are subjected tomutagenesis.
 131. The method of claim 130, wherein mutagenesis comprisesrecursive sequence recombination.
 132. The method of claim 130, whereinthe products of claim 130 are used in (a).
 133. The method of claim 125,wherein the DNA of (d) comprises a library of DNA substrate molecules.134. The method of claim 125, wherein the DNA substrate moleculecomprises a library of mutagenized DNA sequences encoding the protein ofinterest.
 135. The method of claim 125, wherein (d) further comprisesrecovering DNA encoding the protein from affinity purified particles of(c).
 136. A method for optimizing secretion of a protein in a host byevolving a gene encoding a secretory function, comprising: (a) providinga cluster of genes encoding secretory functions; (b) recombining atleast a first and second sequence in the gene cluster of (a) encoding asecretory function, the at least a first and second sequences differingfrom each other in at least one nucleotide, to generate a library ofrecombinant sequences; (c) transforming a host cell culture with theproducts of (b), wherein the host cell comprises a DNA sequence encodingthe protein; (d) subjecting the product of (c) to screening or selectionfor secretion of the protein; and (e) recovering DNA encoding an evolvedgene encoding a secretory function from the product of (d).
 137. Themethod of claim 136, wherein the gene cluster comprises at least onerecognition site for a restriction endonuclease having nonpalindromicends at the cleavage site.
 138. The method of claim 136, wherein thehost is E. coli., yeast, Bacillus, Pseudomonas, or a mammalian cell.139. The method of claim 136, wherein the protein is a thermostable DNApolymerase.
 140. The method of claim 136, wherein protein is induciblyexpressed.
 141. The method of claim 136, wherein the protein is linkedto a secretory leader sequence.
 142. A secretory gene evolved by themethod of claim
 136. 143. The method of claim 136, wherein (a)-(e) arerepeated.
 144. The method of claim 136, wherein the DNA sequence of (c)encodes all or part of a protein selected from Table I.
 145. The methodof claim 136, wherein the DNA sequence of (c) comprises a library ofmutant sequences.
 146. The method of claim 136, wherein the products of(e) are subjected to mutagenesis.
 147. The method of claim 146, whereinmutagenesis comprises recursive sequence recombination.
 148. The methodof claim 146, wherein the products of claim 146 are used in (a). 149.The method of claim 136, wherein the DNA of (e) comprises a library ofevolved genes.
 150. A method for evolving an improved DNA polymerasecomprising: (a) providing a library of mutant DNA substrate moleculesencoding mutant DNA polymerase; (b) screening extracts of cellstransfected with (a) and comparing activity with wild type DNApolymerase; (c) recovering mutant DNA substrate molecules from cells in(b) expressing mutant DNA polymerase having improved activity overwild-type DNA polymerase; and (d) recovering a DNA substrate moleculeencoding an evolved polymerase from the products of (c).
 151. The methodof claim 150, wherein the improved activity is at least one of the groupof higher quality sequencing ladder, less termination of reactions withinosine, improve acceptance of base analogs, improved acceptance ofdideoxy nucleotides, and longer sequencing ladders.
 152. The method ofclaim 150, wherein the products of (a) are expressed under control ofarabinose promoter in an E. coli host having a mutant host DNApolymerase.
 153. The method of claim 150, wherein (a)-(d) are repeated.154. An evolved DNA polymerase produced by the method of claim
 150. 155.The method of claim 150, wherein the products of (d) are subjected tomutagenesis.
 156. The method of claim 155, wherein mutagenesis comprisesrecursive sequence recombination.
 157. The method of claim 155, whereinthe products of claim 155 are used in (a).
 158. The method of claim 150,wherein the DNA substrate molecule of (d) comprises a library of DNAsubstrate molecules.
 159. A method for evolving a DNA polymerase with anerror rate greater than that of wild type DNA polymerase comprising: (a)providing a library of mutant DNA substrate molecules encoding mutantDNA polymerase in a host cell comprising an indicator gene having arevertible mutation, wherein the indicator gene is replicated by themutant DNA polymerase; (b) screening the products of (a) for revertantsof the indicator gene; (c) recovering mutant DNA substrate moleculesfrom revertants; and (d) recovering a DNA substrate molecule encoding anevolved polymerase from the products of (c).
 160. The method of claim159, wherein the indicator gene is LacZalpha or GFP.
 161. The method ofclaim 159 wherein the revertible mutation is a stop codon.
 162. Themethod of claim 159, wherein the host cell comprises a mutant host DNApolymerase.
 163. A method for evolving a DNA polymerase, comprising: (a)providing a library of mutant DNA substrate molecules encoding mutantDNA polymerase, the library comprising a plasmid vector; (b) preparingplasmid preparations and extracts of host cells transfected with theproducts of (a); (c) amplifying each plasmid preparation in a PCRreaction using the mutant polymerase encoded by that plasmid, thepolymerase being present in the host cell extract; (d) recovering thePCR products of (c); and (e) recovering a DNA substrate moleculeencoding an evolved polymerase from the products of (d).
 164. The methodof claim 163, wherein the reaction of (c) is carried out in the presenceof an organic solvent, a base analog, or inosine.
 165. The method ofclaim 163, wherein (a)-(e) are repeated.
 166. An evolved polymeraseproduced by the method of claim
 163. 167. The method of claim 163,wherein the products of (e) are subjected to mutagenesis.
 168. Themethod of claim 167, wherein mutagenesis comprises recursive sequencerecombination.
 169. The method of claim 167, wherein the products ofclaim 167 are used in (a).
 170. The method of claim 163, wherein the DNAsubstrate molecule of (e) comprises a library of DNA substratemolecules.
 171. A method for evolving a p-nitrophenol phosphonatase froma phosphonatase encoded by a DNA substrate molecule, comprising: (a)providing library of mutants of the DNA substrate molecule, the librarycomprising a plasmid expression vector; (b) transfecting a host, whereinthe host phn operon is deleted; (c) selecting for growth of thetransfectants of (b) using a p-nitrophenol phosphonatase as a substrate;(d) recovering the DNA substrate molecules from transfectants selectedfrom (c); and (e) recovering a DNA substrate molecule from (d) encodingan evolved phosphonatase.
 172. The method of claim 171, wherein (a)-(e)are repeated.
 173. The method of claim 171, wherein the phosphonatase isselected from the group consisting of beta-lactamase and alkylphosphonatase.
 174. An evolved p-nitrophenol phosphonatase produced bythe method of claim
 173. 175. The method of claim 171, wherein theproducts of (e) are subjected to mutagenesis.
 176. The method of claim175, wherein mutagenesis comprises recursive sequence recombination.177. The method of claim 175, wherein the products of claim 175 are usedin (a).
 178. The method of claim 171, wherein the DNA substrate moleculeof (e) comprises a library of DNA substrate molecules.
 179. A method forevolving a protease encoded by a DNA substrate molecule comprising: (a)providing library of mutants of the DNA substrate molecule, the librarycomprising a plasmid expression vector, wherein the DNA substratemolecule is linked to a secretory leader; (b) transfecting a host; (c)selecting for growth of the transfectants of (b) on a complex proteinmedium; and (d) recovering a DNA substrate molecule from (c) encoding anevolved protease.
 180. The method of claim 179, wherein (a)-(d) arerepeated.
 181. An evolved subtilisin produced by the method of claim179.
 182. The method of claim 179, wherein the products of (d) aresubjected to mutagenesis.
 183. The method of claim 182, whereinmutagenesis comprises recursive sequence recombination.
 184. The methodof claim 182, wherein the products of claim 184 are used in (a). 185.The method of claim 179, wherein the DNA substrate molecule of (d)comprises a library of DNA substrate molecules.
 186. The method of claim179, wherein the protease is a subtilisin.
 187. A method for screening alibrary of protease mutants displayed on a phage to obtain an improvedprotease, wherein a DNA substrate molecule encoding the protease isfused to DNA encoding a filamentous phage protein to generate a fusionprotein, comprising: (a) providing host cells expressing the fusionprotein; (b) overlaying host cells with a protein net to entrap thephage; (c) washing the product of (b) to recover phage liberated bydigestion of the protein net; (d) recovering DNA from the product of(c); and (e) recovering a DNA substrate from (d) encoding an improvedprotease.
 188. The method of claim 187, wherein (a)-(e) are repeated.189. An evolved protease produced by the method of claim
 187. 190. Themethod of claim 187, wherein the products of (e) are subjected tomutagenesis.
 191. The method of claim 190, wherein mutagenesis comprisesrecursive sequence recombination.
 192. The method of claim 190, whereinthe products of claim 190 are used in (a).
 193. The method of claim 187,wherein the DNA substrate molecule of (e) comprises a library of DNAsubstrate molecules.
 194. A method for screening a library of proteasemutants to obtain an improved protease, the method comprising: (a)providing a library of peptide substrates, the peptide substratecomprising a fluorophore and a fluorescence quencher; (b) screening thelibrary of protease mutants for ability to cleave the peptidesubstrates, wherein fluorescence is measured; and (c) recovering DNAencoding at least one protease mutant from (b).
 195. A method forevolving an alpha interferon gene comprising: (a) providing a library ofmutant alpha interferon genes, the library comprising a filamentousphage vector; (b) stimulating cells comprising a reporter construct, thereporter construct comprising a reporter gene under control of aninterferon responsive promoter, and wherein the reporter gene is GFP;(c) separating the cells expressing GFP by FACS; (d) recovering phagefrom the product of (c); and (e) recovering an evolved interferon genefrom the product of (d).
 196. The method of claim 195, wherein theinterferon responsive promoter is an MHC I promoter.
 197. The method ofclaim 195, wherein (a)-(e) are repeated.
 198. An evolved interferonproduced by the method of claim
 195. 199. The method of claim 195,wherein the products of (e) are subjected to mutagenesis.
 200. Themethod of claim 199, wherein mutagenesis comprises recursive sequencerecombination.
 201. The method of claim 199, wherein the products ofclaim 199 are used in (a).
 202. The method of claim 195, wherein theevolved interferon gene of (e) comprises a library of genes.
 203. Amethod for screening a library of mutants of a DNA substrate encoding aprotein for an evolved DNA substrate, comprising: (a) providing alibrary of mutants, the library comprising an expression vector; (b)transfecting a mammalian host cell with the library of (a), whereinmutant protein is expressed on the surface of the cell; (c) screening orselecting the products of (b) with a ligand for the protein; (d)recovering DNA encoding mutant protein from the products of (c); and (e)recovering an evolved DNA substrate from the products of (d).
 204. Themethod of claim 203, wherein the ligand is an antibody.
 205. The methodof claim 203, wherein the ligand is a substrate and the protein is anenzyme.
 206. The method of claim 203, wherein the expression vectorcomprises an SV40 origin and the host cell is a Cos cell.
 207. Themethod of claim 203, wherein the mutant protein is expressedtransiently.
 208. The method of claim 203, wherein the host cell furthercomprises SV40 large T antigen.
 209. The method of claim 203, whereinthe protein is an antibody.
 210. The method of claim 203, wherein(a)-(e) are repeated.
 211. The method of claim 203, wherein the DNAsubstrate molecule encodes all or part of a protein selected from TableI.
 212. An evolved protein produced by the method of claim
 203. 213. Themethod of claim 203, wherein the products of (e) are subjected tomutagenesis.
 214. The method of claim 213, wherein mutagenesis comprisesrecursive sequence recombination.
 215. The method of claim 213, whereinthe products of claim 213 are used in (a).
 216. The method of claim 203,wherein the DNA substrate molecule of (e) comprises a library of DNAsubstrate molecules.
 217. A method for evolving a DNA substrate moleculeencoding an interferon alpha, comprising: (a) providing a library ofmutant alpha interferon genes, the library comprising an expressionvector wherein the alpha interferon genes are expressed under thecontrol of an inducible promoter; (b) transfecting host cells with thelibrary of (a); (c) contacting the product of (b) with a virus; (d)recovering DNA encoding a mutant alpha interferon from host cellssurviving step (c); and (e) recovering an evolved interferon gene fromthe product of (d).
 218. The method of claim 217, wherein the promoteris a metallothionein promoter.
 219. The method of claim 217, wherein thevirus is HIV.
 220. The method of claim 217, wherein the virus furthercomprises a conditionally lethal gene.
 221. The method of claim 217,wherein the conditionally lethal gene is thymidine kinase.
 222. Themethod of claim 217, wherein the transfected cells are exposed toconditionally lethal selective conditions.
 223. The method of claim 217,wherein (a)-(e) are repeated.
 224. An evolved IFNα polymerase producedby the method of claim
 217. 225. The method of claim 217, wherein theproducts of (e) are subjected to mutagenesis.
 226. The method of claim225, wherein mutagenesis comprises recursive sequence recombination.227. The method of claim 225, wherein the products of claim 218 are usedin (a).
 228. The method of claim 217, wherein the DNA substrate moleculeof (e) comprises a library of DNA substrate molecules.
 229. A method forevolving the stability of a protein encoded by a DNA substrate molecule,the DNA substrate molecule comprising a fusion of a DNA sequenceencoding the protein with a DNA sequence encoding a filamentous phageprotein to generate a fusion protein, the method comprising: (a)providing a host cell expressing a library of mutants of the fusionprotein; (b) affinity purifying the mutants with a ligand for theprotein, wherein the ligand is a human serum protein, tissue specificprotein, or receptor; (c) recovering DNA encoding a mutant protein fromthe affinity selected mutants of (b); and (d) recovering an evolved geneencoding the protein from the product of (c).
 230. The method of claim229, wherein the serum protein is serum albumin, immunoglobulin,lipoprotein, haptoglobin, fibrinogen, transferrin, alpha-1 anti-trypsin,or alpha -2 macroglobulin.
 231. The method of claim 229, wherein the DNAsequence encoding the filamentous phage protein comprises a phage. 232.The method of claim 229, wherein the DNA sequence encoding thefilamentous phage protein comprises a phagemid.
 233. The method of claim229, wherein the products of step (a) are derivitized with a half-lifeextending moiety.
 234. The method of claim 229, wherein the moiety ispolyethylene glycol.
 235. The method of claim 229, wherein the DNAsubstrate molecule comprises a fusion of nucleic acid encoding theprotein with nucleic acid encoding an epitope tag.
 236. The method ofclaim 235, wherein the products of (a) are contacted with a proteaseprior to (b).
 237. The method of claim 235, wherein the ligand is anantibody specific for the epitope tag.
 238. The method of claim 229,wherein the protein is selected from Table I.
 239. The method of claim229, wherein the products of (a) are subjected to heat, metal ions,non-physiological pH, lyophilization, or freeze-thawing before (b). 240.The method of claim 229, wherein (a)-(e) are repeated.
 241. An evolvedpolymerase produced by the method of claim
 229. 242. The method of claim229, wherein the products of (d) are subjected to mutagenesis.
 243. Themethod of claim 242, wherein mutagenesis comprises recursive sequencerecombination.
 244. The method of claim 242, wherein the products ofclaim 242 are used in (a).
 245. The method of claim 229, wherein theevolved gene of (d) comprises a library of DNA substrate molecules. 246.A method for evolving a protein having at least two subunits,comprising: (a) providing a library of mutant DNA substrate moleculesfor each subunit; (b) recombining the libraries into a library of singlechain constructs of the protein, the single chain construct comprising aDNA substrate molecule encoding each subunit sequence, the subunitsequence being linked by a linker at a nucleic acid sequence encodingthe amino terminus of one subunit to a nucleic acid sequence encodingthe carboxy terminus of a second subunit; (c) screening or selecting theproducts of (b), (d) recovering recombinant single chain construct DNAsubstrate molecules from the products of (c); (e) subjecting theproducts of (d) to mutagenesis; and (f) recovering an evolved singlechain construct DNA substrate molecule from (e).
 247. The method ofclaim 246, wherein the products of (b) are displayed on a phage. 248.The method of claim 246, wherein the protein is selected from Table I.249. The method of claim 246, wherein (a)-(f) are repeated.
 250. Anevolved protein produced by the method of claim
 246. 251. The method ofclaim 246, wherein the products of (f) are subjected to mutagenesis.252. The method of claim 246, wherein mutagenesis comprises recursivesequence recombination.
 253. The method of claim 246, wherein theproducts of claim 246 are used in (a).
 254. The method of claim 246,wherein the evolved DNA substrate molecule of (f) comprises a library ofDNA substrate molecules.
 255. A method for evolving the coupling of amammalian 7-transmembrane receptor to a yeast signal transductionpathway, comprising: (a) expressing a library of mammalian G alphaprotein mutants in a host yeast cell, wherein the host cell expressesthe mammalian 7-transmembrane receptor and a reporter gene, the receptorgene geing expressed under control of a yeast pheromone responsivepromoter; (b) screening or selecting the products of (a) for expressionof the reporter gene in the presence of a ligand for the7-transmembrance receptor; and (c) recovering DNA encoding an evolved Galpha protein mutant from screened or selected products of (b).
 256. Themethod of claim 255, wherein the products of (c) are subjected tomutagenesis.
 257. The method of claim 256, wherein mutagenesis comprisesrecursive sequence recombination.
 258. The method of claim 255, whereinthe products of claim 255 are used in (a).
 259. The method of claim 255,wherein (a)-(c) are repeated.
 260. An evolved G alpha protein producedby the method of claim
 255. 261. The method of claim 255, wherein thereporter gene is luciferase.
 262. The method of claim 255, wherein thepheromone responsive promoter is positively regulated by GA L4 andwherein GAL4 is expressed under the control of a pheromone sensitive,GAL4 enhanced promoter.
 263. A method for recombining at least a firstand second DNA substrate molecule, comprising: (a) transfecting a hostcell with at least a first and second DNA substrate molecule wherein theat least a first and second DNA substrate molecules are recombined inthe host cell; (b) screening or selecting the products of (a) for adesired property; and (c) recovering recombinant DNA substrate moleculesfrom (b).
 264. The method of claim 263, wherein the products of (c) aresubjected to mutagenesis.
 265. The method of claim 264, wherein themutagenesis comprises recursive sequence recombination.
 266. The methodof claim 263, wherein (a)-(c) are repeated.
 267. The method of claim263, wherein the products of claim 263 are used in (a).
 268. A methodfor evolving a DNA substrate sequence encoding a protein of interest,wherein the DNA substrate comprises a vector, the vector comprisingsingle-stranded DNA, the method comprising: (a) providingsingle-stranded vector DNA and a library of mutants of the DNA substratesequence; (b) annealing denatured double-stranded DNA from the libraryof (a) to the single stranded vector DNA of (a); (c) transforming theproducts of (b) into a host; (d) screening the product of (c) for adesired property; and (e) recovering evolved DNA substrate DNA from theproducts of (d).
 269. The method of claim 268, wherein the product of(e) is subjected to mutagenesis.
 270. The method of claim 269, whereinmutagenesis comprises recursive sequence recombination.
 271. The methodof claim 269, wherein the product of claim 269 is used in (a).
 272. Themethod of claim 268, wherein the host is a mutS host.
 273. The method ofclaim 268, wherein the vector is a phagemid.